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Preface 


This is a series of lecture notes and problems on “Essential Graduate Physics”, consisting of the 
following four parts: 
CM: Classical Mechanics (for a 1-semester course), 
EM: Classical Electrodynamics (2 semesters), 
QM: Quantum Mechanics (2 semesters), and 
SM: Statistical Mechanics (1 semester). 


The parts share a teaching style, structure, and (with a few exceptions) notation, and are interlinked by 
extensive cross-referencing. I believe that due to this unity, the notes may be used for teaching these 
courses not only in the (preferred) sequence shown above but in almost any order — or in parallel. 


Each part is a two-component package consisting of: 


(1) Lecture Notes chapter texts,” with a list of exercise problems at the end of each chapter, and 
(ii) Exercise and Test Problems with Model Solutions files. 


The series also includes this front matter, two brief reference appendices, MA: Selected Mathematical 
Formulas (16 pp.) and CA: Selected Physics Constants (2 pp), and a list of references. 


The series is a by-product of the so-called core physics courses I taught at Stony Brook 
University from 1991 to 2013. Reportedly, most physics departments require their graduate students to 
either take a set of similar courses or pass comprehensive exams based on an approximately similar 
body of knowledge. This is why I hope that my notes may be useful for both instructors and students of 
such courses, as well as for individual learners. 


The motivation for composing the lecture notes (which had to be typeset because of my horrible 
handwriting) and their distribution to Stony Brook students was my desperation to find textbooks I 
could actually use for teaching. First of all, the textbooks I could find, including the most influential 
Theoretical Physics series by Landau and Lifshitz, did not match my class audiences, which included 
experiment-oriented students, some PhD candidates from other departments, some college graduates 
with substandard undergraduate background, and a few advanced undergraduates. Second, for the rigid 
time restrictions imposed on the core physics courses, most available textbooks are way too long, and 
using them would mean hopping from one topic to another, picking up a chapter here and a section 
there, at a high risk of losing the necessary background material and logical connections between course 
components — and students’ interest with them. On the other hand, many textbooks lack even brief 
discussions of several traditional and modern topics that I believe are necessary parts of every 
professional physicist’s education.?4 


2 The texts are saved as separate .pdf files of each chapter, optimized for two-page viewing and double-side 
printing; merged files for each part and the series as a whole, convenient for search purposes, are also provided. 


3 To list just a few: statics and dynamics of elastic and fluid continua, basic notions of physical kinetics, 
turbulence and deterministic chaos, physics of reversible and quantum computation, energy relaxation and 
dephasing of open quantum systems, the van der Pol method (a.k.a. the Rotating-Wave Approximation, RWA) in 
classical and quantum mechanics, physics of electrons and holes in semiconductors, the weak-potential and tight- 
binding approximations in the energy band theory, optical fiber electrodynamics, macroscopic quantum effects in 
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The main goal of my courses was to make students familiar with the basic notions and ideas of 
physics (hence the series’ title), and my main effort was to organize the material in a logical sequence 
the students could readily follow and enjoy, at each new step understanding why exactly they need to 
swallow the next knowledge pill. As a backside of such a minimalistic goal, I believe that my texts may 
be used by advanced undergraduate physics students as well. Moreover, I hope that selected parts of the 
series may be useful for graduate students of other disciplines, including astronomy, chemistry, 
mechanical engineering, electrical, computer and electronic engineering, and material science. 


At least since Confucius and Sophocles, i.e. for the past 2,500 years, teachers have known that 
students can master a new concept or method only if they have seen its application to at least a few 
particular situations. This is why in my notes, the range of theoretical physics methods is limited to the 
approaches that are indeed necessary for the solution of the problems I had time to discuss, and the 
introduction of every new technique is always accompanied by an application example or two. 
Additional exercise problems are listed at the end of each chapter of the lecture notes; they may be used 
for homework assignments. Individual readers are strongly encouraged to solve as many of these 
problems as possible.5 


Detailed model solutions of the exercise problems (some with additional expansion of the lecture 
material), and several shorter problems suitable for tests (also with model solutions), are gathered in six 
separate files — one per semester. These files are available for both university instructors and individual 
readers — free of charge, but in return for a signed commitment to avoid unlimited distribution of the 
solutions — see p. vii below. For instructors, these files are available not only in the Adobe Systems’ 
Portable Document Format (*.pdf) but also in the Microsoft Office 1997-2003 format (*.doc) free of 
macros, so that the problem assignments and solutions may be readily grouped, edited, etc., before their 
distribution to students, using either virtually any version of Microsoft Word or independent software 
tools — e.g., the public-domain OpenOffice.org. 


I know that my texts are far from perfection. In particular, some sacrifices made at the topic 
selection, always very subjective, were extremely painful. (Most regretfully, I could not find time for 
even a brief introduction to general relativity.©) Moreover, it is almost certain that despite all my effort 
and the great help from SBU students and teaching assistants, not all typos/errors have been weeded out. 
This is why all remarks (however candid) and suggestions by the readers would be highly appreciated; 
they may be sent to klikharev@gmail.com. All significant contributions will be gratefully acknowledged 
in future editions of the series. 


Bose-Einstein condensates, Bloch oscillations and Landau-Zener tunneling, cavity QED, and the Density 
Functional Theory (DFT). All these topics are discussed, if only concisely, in these notes. 


* Recently, several high-quality graduate-level teaching materials became available online, including M. Fowler’s 
Graduate Quantum Mechanics Lectures (http://galileo.phys.virginia.edu/classes/751.mf1i.fall02/home.html), R. 
Fitzpatrick’s text on Classical Electromagnetism (farside.ph.utexas.edu/teaching/jk1/Electromagnetism.pdf), B. 
Simons’ lecture notes on Advanced Quantum Mechanics (www.tcm.phy.cam.ac.uk/~bds10/agp.html), and D. 
Tong’s lecture notes on several topics (www.damtp.cam.ac.uk/user/tong/teaching. html). 


5 The problems that require either longer calculations or more creative approaches (or both) are marked by 
asterisks. 


6 For an introduction to that subject, I can recommend either its review by S. Carroll, Spacetime and Geometry, 
Addison-Wesley, 2003, or a longer text by A. Zee, Einstein Gravity in a Nutshell, Princeton U. Press, 2013. 


Front Matter iv 


Essential Graduate Physics K. Likharev 


Disclaimer 


Since these materials are available free of charge, it is hard to imagine somebody blaming their 
author for deceiving “customers” for his commercial gain. Still, I would like to go a little bit beyond the 
usual litigation-avoiding claims,’ and offer a word of caution to potential readers, to preempt their 
possible later disappointment. 


This is NOT a course of theoretical physics — at least in the contemporary sense of the term 


Though much of the included material is similar to that in textbooks on “theoretical physics” 
(most notably in the famous series by L. Landau and E. Lifshitz), this lecture note series is different 
from them by its focus on the basic concepts and ideas of physics, their relation to experimental data, 
and most important applications — rather than on sophisticated theoretical techniques. Indeed, the set of 
theoretical methods discussed in the notes is limited to the minimum necessary for quantitative 
understanding of the key notions of physics and for solving a few (or rather about a thousand :-) core 
problems. Moreover, because of the notes’ shortness, I have not been able to cover some key fields of 
theoretical physics, most notably the general relativity and the quantum field theory — beyond some 
introductory elements of quantum electrodynamics in QM Chapter 9. If you want to work in modern 
theoretical physics, you need to know much more than what this series teaches! 


Moreover, this is NOT a textbook — at least not the usual one 


A usual textbook tries (though most commonly fails) to cover virtually all aspects of the 
addressed field. As a result, it is typically way too long for being fully read and understood by students 
during the time allocated for the corresponding course, so that the instructors are forced to pick up 
selected chapters and sections, frequently losing the narrative’s logic lines. In contrast, these notes are 
much shorter (about 200 pages per semester), enabling their thorough reading — perhaps with just a few 
later sections dropped, depending on the reader’s interests. I have tried to mitigate the losses due to this 
minimalistic approach by providing extensive further reading recommendations on the topics I had no 
time to cover. The reader is highly encouraged to use these sources (and/or the corresponding chapters 
of more detailed textbooks) on any topics of their special interest. 


Then, what these notes ARE and why you may like to use them — I think 


By tradition, graduate physics education consists of two main components: research experience 
and advanced physics courses. Unfortunately, the latter component is currently under pressure in many 
physics departments, apparently because of two reasons. On one hand, the average knowledge level of 
the students entering graduate school is not improving, so that bringing them up to the level of 
contemporary research becomes increasingly difficult. On the other hand, the research itself is becoming 
more fragmented, so that the students frequently do not feel an immediate need for a broad physics 


7 Yes Virginia, these notes represent only my personal opinions, not necessarily those of the Department of 
Physics and Astronomy of Stony Brook University, the SBU at large, the SUNY system as a whole, the Empire 
State of New York, the federal agencies and private companies that funded my group’s research, etc. No dear, I 
cannot be held responsible for any harm, either bodily or mental, their reading may (?) cause. 
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knowledge base for their PhD project success. Some thesis advisors, trying to maximize the time they 
could use their students as a cheap laboratory workforce, do not help. 


I believe that this trend toward the reduction of broad physics education in graduate school is 
irresponsible. Experience shows that during their future research career, a typical current student will 
change their research fields several times. Starting from scratch in a new field is hard — terribly hard in 
advanced age (believe me :-). However, physics is fortunate to have a stable hard core of knowledge, 
which many other sciences lack. With this knowledge, students will always feel in physics at home, 
while without it, they may not be able even to understand research literature in the new field, and would 
risk being reduced to auxiliary work roles — if any at all. 


I have seen the main objective of my Stony Brook courses to give an introduction to this core of 
physics, at the same time trying to convey my own enchantment by the unparalleled beauty of the 
concepts and ideas of this science, and the remarkable logic of their fusion into a wonderful single 
construct. Let me hope that these notes relay not only the knowledge as such but also at least a part of 
this enchantment. 
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Problem Solution Request Templates 


Requests should be sent to either Alikharev@gmail.com or konstantin.likharev@stonybrook.edu 


in either of the following forms: 
- an email from a valid university address, 


- ascanned copy of a signed letter — as an email attachment. 
Approximate contents: 


A. Request from a Prospective Instructor 
Dear Dr. Likharev, 


I plan to use your lecture notes and problems of the Essential Graduate Physics series, part(s) 
<select: CM, EM, QM, SM>, in my course <title> during <semester, year> in the <department, 
university>. I would appreciate sending me the file(s) Exercise and Test Problems with Model Solutions 
of that part(s) of the series in the <select: .pdf, both .doc and .pdf> format(s). 


I will avoid unlimited distribution of the solutions, in particular their posting on externally 
searchable websites. If I distribute the solutions among my students, I will ask them to adhere to the 
same restraint. 


I will let you know of any significant typos/deficiencies I may find. 


Sincerely, <signature, full name, university position, work phone number> 


B. Request from an Individual Learner 
Dear Dr. Likharev, 


I plan to use your lecture notes and problems of the Essential Graduate Physics series, part(s) 
<select: CM, EM, QM, SM>, for my personal education. I would appreciate sending me the file(s) 
Exercise and Test Problems with Model Solutions of that part(s) of the series. 


I will not share the material with anyone, and will not use it for passing courses that are officially 
based on your series. 


I will let you know of any significant typos/deficiencies I may find. 


Sincerely, <signature, full name, present home address (in English), acting phone number> 
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Notation 
Abbreviations Fonts Symbols 
Eq. any formula (e.g., equality) F, # scalar variables? ’ time differentiation operator (d/dt) 
Fig. figure F, Z vector variables V spatial differentiation vector (de/) 
Sec. section F,# scalar operators = approximately equal to 
c.c. complex conjugate F,# vector operators ~ of the same order as 
h. c. Hermitian conjugate F matrix oc proportional to 
Fj matrix element = equal to by definition (or evidently) 
- scalar (“dot-”) product 

Parts of the series x vector (“cross-”) product!° 

CM: Classical Mechanics ~ time averaging 

EM: Classical Electrodynamics (_ ) statistical averaging 

QM: Quantum Mechanics [ , ] commutator 

SM: Statistical Mechanics { , } anticommutator 
Appendices n unit vector 


MA: Selected Mathematical Formulas 


CA: Selected Physical Constants 


Prime signs 

The prime signs (', ”, etc) are used to distinguish similar variables or indices (such as / and j’ in 
the matrix element above), rather than to denote derivatives. 
Formulas 

The most general and/or important formulas are highlighted with blue frames and short titles on 
the margins. 
Numbering 


Chapter numbers are dropped in all references to formulas, figures, footnotes, and problems 
within the same chapter. 


° The same letter, typeset in different fonts, typically denotes different variables. 
!0 On a few occasions, the cross sign is used to emphasize the usual multiplication of scalars. 
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Chapter 1. Review of Fundamentals 
After a brief discussion of the title and contents of the course, this introductory chapter reviews the 


basic notions and facts of the non-relativistic classical mechanics, that are supposed to be known to the 
reader from their undergraduate studies.! Due to this reason, the discussion is very short. 


1.0. Terminology: Mechanics and dynamics 


A more fair title for this course would be Classical Mechanics and Dynamics, because the 
notions of mechanics and dynamics, though much intertwined, are still somewhat different. The term 
mechanics, in its narrow sense, means the derivation of equations of motion of point-like particles and 
their systems (including solids and fluids), solution of these equations, and interpretation of the results. 
Dynamics is a more ambiguous term; it may mean, in particular: 


(i) the part of physics that deals with motion (in contrast to statics); 

(ii) the part of physics that deals with reasons for motion (in contrast to kinematics); 

(111) the part of mechanics that focuses on its two last tasks, i.e. the solution of the equations of 
motion and discussion of the results.* 


Because of this ambiguity, after some hesitation, I have opted to use the traditional name 
Classical Mechanics, with the word Mechanics in its broad sense that includes (similarly to Quantum 
Mechanics and Statistical Mechanics) studies of dynamics of some non-mechanical systems as well. 


1.1. Kinematics: Basic notions 


The basic notions of kinematics may be defined in various ways, and some mathematicians pay 
much attention to alternative systems of axioms and the relations between them. In physics, we typically 
stick to less rigorous ways (in order to proceed faster to solving particular problems) and end debating 
any definition as soon as “everybody in the room” agrees that we are all speaking about the same thing — 
at least in the context they are being discussed. Let me hope that the following notions used in classical 
mechanics do satisfy this criterion in our “room”: 


! The reader is advised to perform (perhaps after reading this chapter as a reminder) a self-check by solving a few 
problems of those listed in Sec. 1.6. If the results are not satisfactory, it may make sense to start with some 
remedial reading. For that, I could recommend, e.g., J. Marion and S. Thornton, Classical Dynamics of Particles 
and Systems, 5" ed., Saunders, 2003; and D. Morin, Introduction to Classical Mechanics, Cambridge U., 2008. 

2 The reader may have noticed that the last definition of dynamics is suspiciously close to the part of mathematics 
devoted to differential equation analysis; what is the difference? An important bit of philosophy: physics may be 
defined as an art (and a bit of science :-) of describing Mother Nature by mathematical means; hence in many 
cases the approaches of a mathematician and a physicist to a problem are very similar. The main difference 
between them is that physicists try to express the results of their analyses in terms of the properties of the systems 
under study, rather than the functions describing them, and as a result develop a sort of intuition (“gut feeling’’) 
about how other similar systems may behave, even if their exact equations of motion are somewhat different — or 
not known at all. The intuition so developed has an enormous heuristic power, and most discoveries in physics 
have been made through gut-feeling-based insights rather than by plugging one formula into another one. 
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(i) All the Euclidean geometry notions, including the point, the straight line, the plane, etc. 


(11) Reference frames: platforms for observation and mathematical description of physical 
phenomena. A reference frame includes a coordinate system used for measuring the point’s position 
(namely, its radius vector r that connects the coordinate origin to the point — see Fig. 1) and a clock that 
measures fime t. A coordinate system may be understood as a certain method of expressing the radius 
vector r of a point as a set of its scalar coordinates. The most important of such systems (but by no 
means the only one) are the Cartesian (orthogonal, linear) coordinates‘ r; of a point, in which its radius 
vector may be represented as the following sum: 


(1.1) 


Fig. 1.1. Cartesian coordinates of a point. 


(111) The absolute (“Newtonian”) space/time,® which does not depend on the matter distribution. 
The space is assumed to have the Euclidean metric, which may be expressed as the following relation 
between the length r of any radius vector r and its Cartesian coordinates: 


(1.2) 


while time ¢ is assumed to run similarly in all reference frames. These assumptions are critically revised 
in the relativity theory (which, in this series, is discussed only starting from EM Chapter 9.) 


3 All these notions are of course abstractions: simplified models of the real objects existing in Nature. But please 
always remember that any quantitative statement made in physics (e.g., a formula) may be strictly valid only for 
an approximate model of a physical system. (The reader should not be disheartened too much by this fact: 
experiments show that many models make extremely precise predictions of the behavior of the real systems.) 

4 In this series, the Cartesian coordinates (introduced in 1637 by René Descartes, a.k.a. Cartesius) are denoted 
either as either {71, 72, 73} or {x, y, z}, depending on convenience in each particular case. Note that axis numbering 
is important for operations like the vector (“cross”) product; the “correct” (meaning generally accepted) 
numbering order is such that the rotation nj > np > n; > n).... looks counterclockwise if watched from a point 
with all 7; > 0 — like the one shown in Fig. 1. 

5 Note that the representation (1) is also possible for locally-orthogonal but curvilinear (for example, 
polar/cylindrical and spherical) coordinates, which will be extensively used in this series. However, such 
coordinates are not Cartesian, and for them some of the relations given below are invalid — see, e.g., MA Sec. 10. 
6 These notions were formally introduced by Sir Isaac Newton in his main work, the three-volume Philosophiae 
Naturalis Principia Mathematica published in 1686-1687, but are rooted in earlier ideas by Galileo Galilei, 
published in 1632. 
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(iv) The (instant) velocity of the point, 


(1.3) 
and its acceleration: 


(1.4) 


(v) Transfer between reference frames. The above definitions of vectors r, v, and a depend on 
the chosen reference frame (are “reference-frame-specific”), and we frequently need to relate those 
vectors as observed in different frames. Within Euclidean geometry, the relation between the radius 
vectors in two frames with the corresponding axes parallel at the moment of interest (Fig. 2), is very 
simple: 


(1.5) 


Fig. 1.2. Transfer between two reference frames. 


O}in 0' 


If the frames move versus each other by translation only (no mutual rotation!), similar relations 
are valid for the velocities and accelerations as well: 


Vlin o = Ylino + Yo|in ot» (1.6) 


ino’ = lino + o|in 0” (1.7) 


a 

Note that in the case of mutual rotation of the reference frames, the transfer laws for velocities 

and accelerations are more complex than those given by Eqs. (6) and (7). Indeed, in this case, the 

notions like vo | ino’ are not well defined: different points of an imaginary rigid body connected to frame 

0 may have different velocities when observed in frame 0’. It will be more natural for me to discuss 
these more general relations at the end of Chapter 4 devoted to rigid body motion. 


(vi) A particle (or “point particle”): a localized physical object whose size is negligible, and the 
shape is irrelevant to the given problem. Note that the last qualification is extremely important. For 
example, the size and shape of a spaceship are not too important for the discussion of its orbital motion 
but are paramount when its landing procedures are being developed. Since classical mechanics neglects 
the quantum mechanical uncertainties,’ in it, the position of a particle at any particular instant ¢, may be 
identified with a single geometrical point, i.e. with a single radius vector r(t). The formal final goal of 
classical mechanics is finding the Jaws of motion r(t) of all particles participating in the given problem. 


7 This approximation is legitimate when the product of the coordinate and momentum scales of the particle 
motion is much larger than Planck’s constant  ~ 10°* J-s. More detailed conditions of the classical mechanics’ 
applicability depend on a particular system — see, e.g., the QM part of this series. 
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1.2. Dynamics: Newton’s laws 


Generally, the classical dynamics is fully described (in addition to the kinematic relations 
discussed above) by three Newton’s laws. In contrast to the impression some textbooks on theoretical 
physics try to create, these laws are experimental in nature, and cannot be derived from purely 
theoretical arguments. 


I am confident that the reader of these notes is already familiar with Newton’s laws,° in one or 
another formulation. Let me note only that in some formulations, the 1 Newton’s law looks just a 
particular case of the 2" law — when the net force acting on a particle equals zero. To avoid this 
duplication, the 1“ law may be formulated as the following postulate: 


There exists at least one reference frame, called inertial, in which any free particle (i.e. a 
particle fully isolated from the rest of the Universe) moves with v = const, i.e. with a = 0. 


Note that according to Eq. (7), this postulate immediately means that there is also an infinite number of 
inertial reference frames — because all frames 0’ moving without rotation or acceleration relative to the 
postulated inertial frame 0 (i.e. having ao ae 0) are also inertial. 


On the other hand, the 2™ and 3 Newton’s laws may be postulated together in the following 
elegant way. Each particle, say number k, may be characterized by a scalar constant (called mass m,), 
such that at any interaction of N particles (isolated from the rest of the Universe), in any inertial system, 


N N 
P=) p, =) 'm,V, = const. (1.8) 
k=1 k=1 


is called the mechanical momentum? of the corresponding particle, while the sum P, the total momentum 
of the system.) 


(Each component of this sum, 


Let us apply this postulate to just two interacting particles. Differentiating Eq. (8) written for this 
case, over time, we get 


P, =—P- (1.10) 
Let us give the derivative p, (which is a vector) the name of the force F exerted on particle 1. In our 
current case, when the only possible source of the force is particle 2, it may be denoted as F)2: p, =F,,. 
Similarly, F,, = p,, so that Eq. (10) becomes the 3” Newton’s law 


F, =-F,,. (1.11) 


Plugging Eq. (1.9) into these force definitions, and differentiating the products m;,v,, taking into account 
that particle masses are constants,!° we get that for the 4 and k’ taking any of values 1, 2, 


8 Due to the genius of Sir Isaac, these laws were formulated in the same Principia (1687), well ahead of the 
physics of his time. 

9 The more extended term /inear momentum is typically used only in cases when there is a chance of confusion 
with the angular momentum of the same particle/system — see below. The present-day definition of the linear 
momentum and the term itself belong to John Wallis (1670), but the concept may be traced back to more vague 
notions of several previous scientists — all the way back to at least a 570 AD work by John Philoponus. 
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mV, =m,a, =F, where k'’ # k.. (1.12) 


Now, returning to the general case of several interacting particles, and making an additional (but 
very natural) assumption that all partial forces Fy, acting on particle 4 add up as vectors, we may 
generalize Eq. (12) into the 2” Newton’s law 


(1.13) 


that allows a clear interpretation of the mass as a measure of a particle’s inertia. 


As a matter of principle, if the dependence of all pair forces Fi, of particle positions (and 
generally of time as well) is known, Eq. (13) augmented with the kinematic relations (2) and (3) allows 
calculation of the laws of motion r;,(¢) of all particles of the system. For example, for one particle the Pi 
law (13) gives an ordinary differential equation of the second order: 


mr = F(r,f), (1.14) 
which may be integrated — either analytically or numerically. 


In certain cases, this is very simple. As an elementary example, for local motions with Ar << r, 
Newton’s gravity force!! 
mm 


Fasc. 


(1.15) 


(where R=r-—r’ is the distance between particles of masses m and m’)!2 may be approximated as 


with the vector g =—(Gm’/R°)R being constant.!3 As a result, m in Eq. (13) cancels, it is reduced to just 
r =g =const, and may be easily integrated twice: 


F(t) = v(t) = fe dt'+v(0)=gt+v(0), _r(t)= [rear +r(0) = gf +v(O)t+r(0), (1.17) 


thus giving the generic solution to all those undergraduate problems on the projectile motion, which 
should be so familiar to the reader. 


10 Note that this may not be true for composite bodies of varying total mass M (e.g., rockets emitting jets, see 
Problem 11), in these cases the momentum’s derivative may differ from Ma. 

'l [Introduced in the same famous Principia! 

12 The fact that the masses participating in Eqs. (14) and (16) are equal, the so-called weak equivalence principle, 
is actually highly nontrivial, but has been repeatedly verified experimentally with gradually improved relative 
accuracy, currently reaching ~10"'* — see P. Touboul et al., Phys. Rev. Lett. 119, 231101 (2017). 

!3 Of course, the most important particular case of Eq. (16) is the gravity field near the Earth’s surface. In this 
case, using the fact that Eq. (15) remains valid for the gravity field created by a spherically-uniform sphere, we 
get g= GM,;/R:-, where Mz and Rx are the Earth’s mass and radius. Plugging in their values, Mp = 5.97x1074 kg 
and Rr ~ 6.37x10° m, we get g ~ 9.82 m/s’. The experimental value of g varies from 9.78 to 9.83 m/s” at various 
locations on the surface (due to the deviations of Earth’s shape from a sphere, and the location-dependent effect of 
the centrifugal “inertial force” — see Sec. 4.5 below), with an average value of approximately 9.807 m/s’. 
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All this looks (and indeed is) very simple, but in most other cases, Eq. (13) leads to more 
complex calculations. As an example, let us think about how would we use it to solve another simple 
problem: a bead of mass m sliding, without friction, along a round ring of radius R in a gravity field 
obeying Eq. (16) — see Fig. 3. (This system is equivalent to the usual point pendulum, i.e. a point mass 
suspended from point 0 on a light rod or string, and constrained to move in one vertical plane.) 


\ initial 
“y position, 
v=0 


intermediate 
position 


final 
position, 
y=? 


mg Fig. 1.3. A bead sliding along a vertical ring. 


Suppose we are only interested in the bead’s velocity v at the lowest point, after it has been 
dropped from the rest at the rightmost position. If we want to solve this problem using only the Newton 
laws, we have to make the following steps: 


(i) consider the bead in an arbitrary intermediate position on a ring, described, for example by 
the angle 0 shown in Fig. 3; 

(11) draw all the forces acting on the particle — in our current case, the gravity force mg and the 
reaction force N exerted by the ring — see Fig. 3 above 

(iii) write the Cartesian components of the 2™’ Newton’s law (14) for the bead acceleration: ma, 
=Nmay= Nim, 

(iv) recognize that in the absence of friction, the force N should be normal to the ring, so that we 
can use two additional equations, VN, =—N sin@ and N, = N cos@; 

(v) eliminate unknown variables N, N,, and N, from the resulting system of four equations, thus 
getting a single second-order differential equation for one variable, for example, @ 


mRO =—mgsin@; (1.18) 


(vi) use the mathematical identity 0 =d (6° / 2)/ d@ to integrate this equation over 0 once to get 
an expression relating the velocity 6 and the angle @; and, finally, 

(vii) using our specific initial condition (@ = Oat 6 = 2/2), find the final velocity as v= RO at 
Ad=0. 


All this is very much doable, but please agree that the procedure it too cumbersome for such a 
simple problem. Moreover, in many other cases even writing equations of motion along relevant 
coordinates is very complex, and any help the general theory may provide is highly valuable. In many 
cases, such help is given by conservation laws; let us review the most general of them. 
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1.3. Conservation laws 


(1) Energy conservation is arguably the most general law of physics, but in mechanics, it takes a 
more humble form of mechanical energy conservation, which has limited applicability. To derive it, we 
first have to define the kinetic energy of a particle as!4 


T=’, (1.19) 


and then recast its differential as!> 


at =a{ 2" |= d{Zv-v)=my-dv =m sie (1.20) 
0) 2 dt dt 


Now plugging in the momentum’s derivative from the 2"’ Newton’s law, dp/dt = F, where F is the full 
force acting on the particle, we get dT = F-dr. The integration of this equality along the particle’s 
trajectory connecting some points A and B gives the formula that is sometimes called the work-energy 
principle: 


AT =T(r,)-T(r,) = [F-ar, (1.21) 


where the integral on the right-hand side is called the work of the force F on the path from A to B. 


The next step may be made only for a potential (also called “conservative”’) force that may be 
represented as the (minus) gradient of some scalar function U(r), called the potential energy.'® The 
vector operator V (called either del or nabla) of spatial differentiation!’ allows a very compact 
expression of this fact: 


F=-VU. (1.22) 
For example, for the uniform gravity field (16), 


U =mgh + const, (1.23) 
where / is the vertical coordinate directed “up” — opposite to the direction of the vector g. 


Integrating the tangential component F’, of the vector F given by Eq. (22), along an arbitrary path 
connecting the points A and B, we get 


| F.dr = [F-dr =U(r,)-UGy), (1.24) 


14 In such quantitative form, the kinetic energy was introduced (under the name “living force”) by Gottfried 
Leibniz and Johann Bernoulli (circa 1700), though its main properties (21) and (27) had not been clearly revealed 
until an 1829 work by Gaspard-Gustave de Coriolis. The modern term “kinetic energy” was coined only in 1849- 
1851 by Lord Kelvin (born William Thomson). 

!5 In these notes, a-b denotes the scalar (or “dot-”) product of vectors a and b — see, e.g., MA Eq. (7.1). 

16 Note that because of its definition via the gradient, the potential energy is only defined to an arbitrary additive 
constant. This notion had been used already by G. Leibniz, though the term we are using for it nowadays was 
introduced much later (in the mid-19" century) by William Rankine. 

'7 Its basic properties are listed in MA Sec. 8. 
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i.e. work of potential forces may be represented as the difference of values of the function U(r) in the 
initial and final points of the path. (Note that according to Eq. (24), the work of a potential force on any 
closed path, with ra = rpg, is zero.) 


Now returning to Eq. (21) and comparing it with Eq. (24), we see that 
T(r,)-Tr",) =U, )-U(rg), Le. Tr,)+U,) =TH)+U (ry), (1.25) 


so that the total mechanical energy E, defined as = 
ota 


E=T+U : (1.26) mechanical 


energy 


Mechanical 
E(r,) = E(rg), (1.27) energy: 
conservation 


but for conservative forces only. (Non-conservative forces may change E by either transferring energy 
from its mechanical form to another form, e.g., to heat in the case of friction, or by pumping the energy 
into the system under consideration from another, “external” system.) 


is indeed conserved: 


The mechanical energy conservation allows us to return for just a second to the problem shown 
in Fig. 3 and solve it in one shot by writing Eq. (27) for the initial and final points:!® 


0-+mgR =v" +0. (1.28) 


The (elementary) solution of Eq. (28) for v immediately gives us the desired answer. Let me hope that 
the reader agrees that this way of problem’s solution is much simpler, and I have earned their attention 
to discuss other conservation laws — which may be equally effective. 


(ii) Linear momentum. The conservation of the full linear momentum of any system of particles 
isolated from the rest of the world was already discussed in the previous section, and may serve as the 
basic postulate of classical dynamics — see Eq. (8). In the case of one free particle, the law is reduced to 
the trivial result p = const, i.e. v = const. If a system of N particles is affected by external forces F“”, 
we may write 

N 
F, =F? + Fy. (1.29) 
k=l 
If we sum up the resulting Eqs. (13) for all particles of the system then, due to the 3" Newton’s law (11) 
valid for any indices k 4 k’, the contributions of all internal forces F;,: to the resulting double sum on the 
right-hand side cancel, and we get the following equation: 


7 System's 
ry _ B(ext) (ext) _ (ext) momentum 
P=F'’, where F'™’ = YF, : (1.30) evolution 


k=1 


It tells us that the translational motion of the system as the whole is similar to that of a single particle, 
under the effect of the net external force F“”. As a simple sanity check, if the external forces have a 
zero sum, we return to the postulate (8). Just one reminder: Eq. (30), as its precursor Eq. (13), is only 
valid in an inertial reference frame. 


18 Here the arbitrary constant in Eq. (23) is chosen so that the potential energy is zero at the final point. 
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I hope that the reader knows numerous examples of the application of the linear momentum’s 
conservation law, including all these undergraduate problems on car collisions, where the large collision 
forces are typically not known so that the direct application of Eq. (13) to each car is impracticable. 


(111) The angular momentum of a particle!9 is defined as the following vector:2° 


L=rxp, (1.31) 


where axb means the vector (or “cross-“) product of the vector operands.?! Differentiating Eq. (31) over 
time, we get 


L=rxp+rxp. (1.32) 


In the first product, ris just the velocity vector v, parallel to the particle momentum p = my, so that this 
term vanishes since the vector product of any two parallel vectors equals zero. In the second product, p 


is equal to the full force F acting on the particle, so that Eq. (32) is reduced to 


is called the torque exerted by force F.2? (Note that the torque is reference-frame specific — and again, 
the frame has to be inertial for Eq. (33) to be valid, because we have used Eq. (13) for its derivation.) 
For an important particular case of a central force F that is directed along the radius vector r of a 
particle, the torque vanishes, so that (in that particular reference frame only!) the angular momentum is 


conserved: 
439 


For a system of N particles, the total angular momentum is naturally defined as 


N 
k=1 


Differentiating this equation over time, using Eq. (33) for each L ,,and again partitioning each force per 
Eq. (29), we get 


where the vector 


N N 
L= Dor, x Fy tt, where 1 =)r, x Fi". (1.37) 
k,k'=1 k=1 
k'#k 


The first (double) sum may be always divided into pairs of the type (1, x Fig’ + re x Fe). With a natural 
assumption of the central forces, F,,: || (rj, — rx’), each of these pairs equals zero. Indeed, in this case, 


19 Here we imply that the internal motions of the particle, including its rotation about its axis, are negligible. 
(Otherwise, it could not be represented by a point, as was postulated in Sec. 1.) 

20 This explicit definition of the angular momentum (in different mathematical forms, and under the name of 
“moment of rotational motion”) has appeared in scientific publications only in the 1740s, though the fact of its 
conservation (35) in the field of central forces, in the form of the 2" Kepler law (see Fig. 3.4 below), had been 
proved already by I. Newton in his Principia. 

21 See, e.g., MA Eq. (7.3). 

22 Alternatively, especially in mechanical engineering, torque is called the force moment. This notion may be 
traced all the way back to Archimedes’ theory of levers developed in the 3" century BC. 
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each component of the pair is a vector perpendicular to the plane containing the positions of both 
particles and the reference frame origin, i.e. to the plane of the drawing in Fig. 4. 


(ext) 
F 


Fig. 1.4. Internal and external forces, and 
4 the internal torque cancellation in a system 
0 of two particles. 


Also, due to the 3" Newton’s law (11), these two forces are equal and opposite, and the 
magnitude of each term in the sum may be represented as | Fix | Axe, with equal “lever arms” Ay: = hire: 
As a result, each sum (r,xFix° + rpxF x), and hence the whole double sum in Eq. (37) vanish, and it is 


reduced to a very simple result, System's 


ee angular 
L=t® : (1.38) momentum: 


evolution 
which is similar to Eq. (33) for a single particle, and is the angular analog of Eq. (30). 


In particular, Eq. (38) shows that if the full external torque t*” vanishes for some reason (e.g., if 


the system of particles is isolated from the rest of the Universe), the conservation law (35) is valid for 
the full angular momentum L even if its individual components L, are not conserved due to inter- 
particle interactions. 


Please note again that since the conservation laws may be derived from Newton’s laws (as was 
done above), they do not introduce anything new to the dynamics of any system. Indeed, from the 
mathematical point of view, the conservation laws discussed above are just the first integrals of the 
second-order differential equations of motion following from Newton’s laws. However, for a physicist, 
thinking about particular systems in the terms of the conserved (or potentially conserved) quantities 
frequently provides decisive clues on their dynamics. 


1.4. Potential energy and equilibrium 


Another important role of the potential energy U, especially for dissipative systems whose total 
mechanical energy E is not conserved because it may be drained to the environment, is finding the 
positions of equilibrium (sometimes called the fixed points) of the system and analyzing their stability 
with respect to small perturbations. For a single particle, this is very simple: the force (22) vanishes at 
each extremum (either minimum or maximum) of the potential energy.?3 (Of those fixed points, only the 
minimums of U(r) are stable — see Sec. 3.2 below for a discussion of this point.) 


A slightly more subtle case is a particle with an internal potential energy U(r), subjected to an 
additional external force F*"(r). In this case, the stable equilibrium is reached at the minimum of not 
the function U(r), but of what is sometimes called the Gibbs potential energy 


23 Assuming that the additional, non-conservative forces (such as viscosity) responsible for the mechanical energy 
drain, vanish at equilibrium — as they typically do. (The static friction is one counter-example.) 
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(1.39) 


which is defined, just as U(r) is, to an arbitrary additive constant.24* The proof of Eq. (39) is very 
simple: in an extremum of this function, the total force acting on the particle, 


r 
Fo) = F+ Fo = VU +V [RO (r')- de’ = -VU, (1.40) 


vanishes, as it is necessary for equilibrium. 


Physically, the difference Ug — U specified by Eq. (39) is the r-dependent part of the potential 
energy U“ of the external system responsible for the force F®, so that Ug is just the total potential 
energy U + U”, excluding its part that does not depend on r and hence is irrelevant for the analysis. 
According to the 3 Newton’s law, the force exerted by the particle on the external system equals (— 
F), so that its work (and hence the change of U“*” due to the change of r) is given by the second term 
on the right-hand side of Eq. (39). Thus the condition of equilibrium, VUg = 0, is just the condition of 
an extremum of the total potential energy, U+ U“*" + const, of the two interacting systems. 


For the simplest (and very frequent) case when the applied force is independent of the particle’s 
position, the Gibbs potential energy (39) is just?5 


U,(r)=U(r)-F” -r+const. (1.41) 


As the simplest example, consider a 1D deformation of the usual elastic spring providing the returning 
force (—«x), where x is the deviation from its equilibrium. As follows from Eq. (22), its potential energy 
is U= «x’/2 + const, so that its minimum corresponds to x = 0. Now let us apply an additional external 
force F’, say independent of x. Then the equilibrium deformation of the spring, xo = F/«, corresponds to 
the minimum of not U, but rather of the Gibbs potential energy (41), in our particular case taking the 


form 
2 


U, =U Fase Fx. (1.42) 


1.5. OK, we’ve got it — can we go home now? 


Sorry, not yet. In many cases, the conservation laws discussed above provide little help, even in 
systems without dissipation. As a simple example, consider a generalization of the bead-on-the-ring 
problem shown in Fig. 3, in which the ring is rotated by external forces, with a constant angular velocity 
@, about its vertical diameter.?¢ In this problem (to which I will repeatedly return below, using it as an 


24 Unfortunately, in most textbooks, the association of the (unavoidably used) notion of Ug with the glorious 
name of Josiah Willard Gibbs is postponed until a course of statistical mechanics and/or thermodynamics, where 
Ug is a part of the Gibbs free energy, in contrast to U, which is a part of the Helmholtz free energy — see, e.g., SM 
Sec. 1.4. I use this notion throughout my series, because the difference between Ug and U, and hence that between 
the Gibbs and Helmholtz free energies, has nothing to do with statistics or thermal motion, and belongs to the 
whole physics, including not only mechanics but also electrodynamics and quantum mechanics. 

25 Eq. (41) is a particular case of what mathematicians call the Legendre transformations. 

26 This is essentially a simplified model of the mechanical control device called the centrifugal (or “flyball’”’, or 
“centrifugal flyball”) governor — see, e.g., http://en.wikipedia.org/wiki/Centrifugal governor. (Sometimes the 
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analytical mechanics “testbed”), none of the three conservation laws listed in the last section, holds. In 
particular, the bead’s energy, 


E a +mgh, (1.43) 


is not constant, because the external forces rotating the ring may change it. Of course, we still can solve 
the problem using Newton’s laws, but this is even more complex than for the above case of the ring at 
rest, in particular because the force N exerted on the bead by the ring now may have three rather than 
two Cartesian components, which are not simply related. On the other hand, it is clear that the bead still 
has just one degree of freedom (say, the angle 9), so its dynamics should not be too complicated. 


This case gives us a clue on how situations like this one can be simplified: if we only could 
exclude the so-called reaction forces such as N, that take into account external constraints imposed on 
the particle motion, in advance, that should help a lot. Such a constraint exclusion may be provided by 
analytical mechanics, in particular its Lagrangian formulation, to which we will now proceed. 


Of course, the value of the Lagrangian approach goes far beyond simple systems such as the 
bead on a rotating ring. Indeed, this system has just two externally imposed constrains: the fixed 
distance of the bead from the center of the ring, and the instant angle of rotation of the ring about its 
vertical diameter. Now let us consider the motion of a rigid body. It is essentially a system of a very 
large number, N >> 1, of particles (~10° of them if we think about atoms in a 1-cm-scale body). If the 
only way to analyze its motion would be to write Newton’s laws for each of the particles, the situation 
would be completely hopeless. Fortunately, the number of constraints imposed on its motion is almost 
similarly huge. (At negligible deformations of the body, the distances between each pair of its particles 
should be constant.) As a result, the number of actual degrees of freedom of such a body is small (at 
negligible deformations, just six — see Sec. 4.1), so that with the kind help from analytical mechanics, 
the motion of the body may be, in many important cases, analyzed even without numerical calculations. 


One more important motivation for analytical mechanics is given by the dynamics of “non- 
mechanical” systems, for example, of the electromagnetic field — possibly interacting with charged 
particles, conducting bodies, etc. In many such systems, the easiest (and sometimes the only practicable) 
way to find the equations of motion is to derive them from either the Lagrangian or Hamiltonian 
function of the system. Moreover, the Hamiltonian formulation of the analytical mechanics (to be 
reviewed in Chapter 10 below) offers a direct pathway to deriving quantum-mechanical Hamiltonian 
operators of various systems, which are necessary for the analysis of their quantum properties. 


1.6. Self-test problems 


1.1. A bicycle, ridden with velocity v on wet pavement, has no mudguards on its wheels. How 
far behind should the following biker ride to avoid being splashed over? Neglect the air resistance 
effects. 


device is called the “Watt’s governor”, after the famous James Watts who used it in 1788 in one of his first steam 
engines, though it had been used in European windmills at least since the early 1600s.) Just as a curiosity: the 
now-ubiquitous term cybernetics was coined by Norbert Wiener in 1948 from the word “governor” (or rather 
from its Ancient-Greek original koBspvytys) exactly in this meaning because the centrifugal governor had been 
the first well-studied control device. 
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1.2. Two round disks of radius R are firmly connected with a coaxial 
cylinder of a smaller radius r, and a thread is wound on the resulting spool. 
The spool is placed on a horizontal surface, and the thread’s end is being 
pooled out at angle g — see the figure on the right. Assuming that the spool 
does not slip on the surface, what direction would it roll? 


1.3.” Calculate the equilibrium shape of a flexible heavy rope of 

length /, with a constant mass w per unit length, if it is hung in a 

: : : : X 
uniform gravity field between two points separated by a horizontal 

distance d — see the figure on the right. 


1.4. A uniform, long, thin bar is placed horizontally on two 
similar round cylinders rotating toward each other with the same 
angular velocity w and displaced by distance d — see the figure on 
the right. Calculate the laws of relatively slow horizontal motion of 
the bar within the plane of the drawing, for both possible directions 
of cylinder rotation, assuming that the friction force between the 
slipping surfaces of the bar and each cylinder obeys the simple 
Coulomb approximation?’ | F | = iN, where N is the normal pressure force between them, and yw is a 
constant (velocity-independent) coefficient. Formulate the condition of validity of your result. 


1.5. A small block slides, without friction, down a smooth slide 
that ends with a round loop of radius R — see the figure on the right. 
What smallest initial height 4 allows the block to make its way around h 
the loop without dropping from the slide if it is launched with negligible 
initial velocity? 


1.6. A satellite of mass m is being launched from height H over : vo 
the surface of a spherical planet with radius R and mass M >> m — see the 
figure on the right. Find the range of initial velocities vo (normal to the Pm 


radius) providing closed orbits above the planet’s surface. 


1.7. Prove that the thin-uniform-disk model of a galaxy describes small sinusoidal (“harmonic’’) 
oscillations of stars inside it, along the direction normal to the disk, and calculate the frequency of these 
oscillations in terms of Newton’s gravitational constant G and density ¢ of the disk’s matter. 


27 It was suggested in 1785 by the same Charles-Augustin de Coulomb who has discovered the famous Coulomb 
law of electrostatics, and hence pioneered the whole quantitative science of electricity — see EM Ch. 1. 
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1.8. Derive differential equations of motion for small oscillations of two 
similar pendula coupled with a spring (see the figure on the right), within their 
common vertical plane. Assume that at the vertical position of both pendula, the 
spring is not stretched (AL = 0). g 


1.9. One of the popular futuristic concepts of travel is digging a straight railway tunnel through 
the Earth and letting a train go through it, without initial velocity — driven only by gravity. Calculate the 
train’s travel time through such a tunnel, assuming that the Earth’s density ¢ is constant, and neglecting 
the friction and planet-rotation effects. 


1.10. A small bead of mass m may slide, without friction, ad 
along a light string stretched with force >> mg, between two F N N TF 
points separated by a horizontal distance 2d — see the figure on the Nm N 


right. Calculate the frequency of horizontal oscillations of the bead 
about its equilibrium position. 


st 


1.11. For a rocket accelerating due to its working jet motor (and hence spending the jet fuel), 
calculate the relation between its velocity and the remaining mass. 


Hint: For the sake of simplicity, consider the 1D motion. 
1.12. Prove the following virial theorem:?8 for a set of N particles performing a periodic motion, 


— 1 
T=-—)>F, ‘4, , 
2 


where the top bar means averaging over time — in this case over the motion period. What does the virial 
theorem say about: 


(i) a 1D motion of a particle in the confining potential2® U(x) = ax**, with a> 0 and s > 0, and 
(11) an orbital motion of a particle in the central potential U(r) =—C/r? 


N 
Hint: Explore the time derivative of the following scalar function of time: G(t) = yp as ae 
k=l 


1.13. As will be discussed in Chapter 8, if a body moves through a fluid with a sufficiently high 
velocity v, the fluid’s drag force is approximately proportional to v. Use this approximation (introduced 
by Sir Isaac Newton himself) to find the velocity as a function of time during the body’s vertical fall in 
the air near the Earth’s surface. 


1.14. A particle of mass m, moving with velocity u, collides head-on with a particle of mass M, 
initially at rest. Calculate the velocities of both particles after the collision, if the initial energy of the 
system is barely sufficient for an increase of its internal energy by AE. 


28 It was first stated by Rudolf Clausius in 1870. 

29 Here and below I am following the (regretful) custom of using the single word “potential” for the potential 
energy of the particle — just for brevity. This custom is also common in quantum mechanics, but in 
electrodynamics, these two notions should be clearly distinguished — as they are in the EM part of this series. 
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Chapter 2. Lagrangian Analytical Mechanics 


The goal of this chapter is to describe the Lagrangian formalism of analytical mechanics, which is 
extremely useful for obtaining the differential equations of motion (and sometimes their first integrals) 
not only for mechanical systems with holonomic constraints but also for some other dynamic systems. 


2.1. Lagrange equations 


In many cases, the constraints imposed on the 3D motion of a system of N particles may be 
described by N vector (i.e. 3N scalar) algebraic equations 


Wy =U (Gy Fy see joer Ds rb)s with L<k<N, (2.1) 


where q; are certain generalized coordinates that (together with constraints) completely define the 
system position. Their number J < 3N is called the number of the actual degrees of freedom of the 
system. The constraints that allow such a description are called holonomic.! 


For example, for the problem already mentioned in Section 1.5, namely the bead sliding along a 
rotating ring (Fig. 1), J = 1, because with the constraints imposed by the ring, the bead’s position is 
uniquely determined by just one generalized coordinate — for example, its polar angle 0. 


Fig. 2.1. A bead on a rotating ring as an 
example of a system with just one 
degree of freedom (/= 1). 


Indeed, selecting the reference frame as shown in Fig. | and using the well-known formulas for 
the spherical coordinates,” we see that in this case, Eq. (1) has the form 


r= {x, y, z} = {R sinOcosg, Rsin@sing, Rcos 6}, where g = wt + const , (2.2) 


with the last constant depending on the exact selection of the axes x and y and the time origin. Since the 
angle g, in this case, is a fixed function of time, and R is a fixed constant, the particle’s position in space 


! Possibly, the simplest counter-example of a non-holonomic constraint is a set of inequalities describing the hard 
walls confining the motion of particles in a closed volume. Non-holonomic constraints are better dealt by other 
methods, e.g., by imposing proper boundary conditions on the (otherwise unconstrained) motion. 

2 See, e.g., MA Eq. (10.7). 
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at any instant ¢ is completely determined by the value of its only generalized coordinate 6. (Note that its 
dimensionality is different from that of Cartesian coordinates!) 


Now returning to the general case of J degrees of freedom, let us consider a set of small 
variations (alternatively called “virtual displacements”) 6g; allowed by the constraints. Virtual 
displacements differ from the actual small displacements (described by differentials dq; proportional to 
time variation dt) in that dq; describes not the system’s motion as such, but rather its possible variation — 
see Fig. 1. 


possible 
motion 


qj 


actual 
motion 


Fig. 2.2. Actual displacement dq; vs. the 
virtual one (i.e. variation) 6q;. 


iB 


Generally, operations with variations are the subject of a special field of mathematics, the 
calculus of variations.? However, the only math background necessary for our current purposes is the 
understanding that operations with variations are similar to those with the usual differentials, though we 
need to watch carefully what each variable is a function of. For example, if we consider the variation of 
the radius vectors (1), at a fixed time ¢, as functions of independent variations 6g;, we may use the usual 
formula for the differentiation of a function of several arguments:* 


or 
or, = Lia (2.3) 
j OG; 


Now let us break the force acting upon the k" particle into two parts: the frictionless, 
constraining part N; of the reaction force and the remaining part F; — including the forces from other 


sources and possibly the frictional part of the reaction force. Then the 2"' Newton’s law for the k" 
particle of the system may be rewritten as 


m,v, -F, =N,.- (2.4) 


Since any variation of the motion has to be allowed by the constraints, its 3N-dimensional vector with V 
3D-vector components or; has to be perpendicular to the 3N-dimensional vector of the constraining 
forces, also having N 3D-vector components N;. (For example, for the problem shown in Fig. 1, the 
virtual displacement vector or, may be directed only along the ring, while the constraining force N 
exerted by the ring, has to be perpendicular to that direction.) This condition may be expressed as 


3 For a concise introduction to the field see, e.g., either I. Gelfand and S. Fomin, Calculus of Variations, Dover, 
2000, or L. Elsgolc, Calculus of Variations, Dover, 2007. An even shorter review may be found in Chapter 17 of 
Arfken and Weber — see MA Sec. 16. For a more detailed discussion, using many examples from physics, see R. 
Weinstock, Calculus of Variations, Dover, 2007. 

4 See, e.g., MA Eq. (4.2). Also, in all formulas of this section, summations over j are from | to J, while those over 
the particle number & are from 1 to N, so that for the sake of brevity, these limits are not explicitly specified. 
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DN, Or, =0, (2.5) 
k 


where the scalar product of 3N-dimensional vectors is defined exactly like that of 3D vectors, i.e. as the 
sum of the products of the corresponding components of the operands. The substitution of Eq. (4) into 
Eq. (5) results in the so-called D ’Alembert principle:> 


> (m,¥, -F,)- or, =0. (2.6) 


k 


Plugging Eq. (3) into Eq. (6), we get 


Jj J 


— Fe 
E {Ems ah, =(), (2.7) 
where the scalars 4, called the generalized forces, are defined as follows:® 


(2.8) 


Now we may use the standard argument of the calculus of variations: for the left-hand side of 
Eq. (7) to be zero for an arbitrary selection of independent variations 6q;, the expression in the curly 
brackets, for every 7, should equal zero. This gives us the desired set of J < 3N equations 


J 


. Of} 
>im,V,:-—-F =0; (2.9) 
k 0 j 
what remains is just to recast them in a more convenient form. 
First, using the differentiation by parts to calculate the following time derivative: 


a ae 2 Ee OE | (2.10) 
dt 04; 04; dt\ 0q; 


we may notice that the first term on the right-hand side is exactly the scalar product in the first term of 
Eq. (9). 


Second, let us use another key fact of the calculus of variations (which is, essentially, evident 
from Fig. 3): the differentiation of a variable over time and over the generalized coordinate variation (at 
a fixed time) are interchangeable operations. As a result, in the second term on the right-hand side of Eq. 


(10), we may write 
d| Or, _ 9 dr, _ OV, . (2.11) 
dt\ 0q,; } 0q,\ dt 04; 


5 It was spelled out in a 1743 work by Jean le Rond d’Alembert, though the core of this result has been traced to 
an earlier work by Jacob (Jean) Bernoulli (1667 — 1748) — not to be confused with his son Daniel Bernoulli (1700- 
1782) who is credited, in particular, for the Bernoulli equation for ideal fluids, to be discussed in Sec. 8.4 below. 

6 Note that since the dimensionality of generalized coordinates may be arbitrary, that of generalized forces may 
also differ from the newton. 
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J-<— (df) = d(f) 


Fig. 2.3. The variation of the differential (of 
any smooth function f) is equal to the 
differential of its variation. 


Finally, let us differentiate Eq. (1) over time: 


dr, Or,. Or, 
Vv, =— =) —4,+—. 212 
dt Lay at oe 
This equation shows that particle velocities vz may be considered to be linear functions of the 
generalized velocities q, considered as independent variables, with proportionality coefficients 


Cl Ina (2.13) 
0g, 4, 
With the account of Eqs. (10), (11), and (13), Eq. (9) turns into 
OV, Ov 2 
LY mv, -2E-Y mv, TAG =0. (2.14) 
dt k 04, k 0q; 
This result may be further simplified by making, for the total kinetic energy of the system, 
jeep at ve = Yim y, Vy, (2.15) 
e 2 2% 


the same commitment as for v;, i.e. considering 7 a function of not only the generalized coordinates gq; 
and time ¢ but also of the generalized velocities g, — as variables independent of q; and t. Then we may 


calculate the partial derivatives of T as 


one, ya (2.16) 
Og; % oq; Og; “F oq; 


and notice that they are exactly the two sums participating in Eq. (14). As a result, we get a system of J 
Lagrange equations,’ 

General 
ae Le op =, for fat 2pagd (2.17) Lagrange 


equations 


dt aq, aq 


a 


Their big advantage over the initial Newton’s-law equations (4) is that the Lagrange equations do not 
include the constraining forces N;, and thus there are only J of them — typically much fewer than 3N. 


7 They were derived in 1788 by Joseph-Louis Lagrange, who pioneered the whole field of analytical mechanics — 
not to mention his key contributions to the number theory and celestial mechanics. 
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This is as far as we can go for arbitrary forces. However, if all the forces may be expressed in the 
form similar to, but somewhat more general than Eq. (1.22), Fy = —V,U(r1, ¥2,..., rv, t), where U is the 
effective potential energy of the system,* and V; denotes the spatial differentiation over coordinates of 
the k" particle, we may recast Eq. (8) into a simpler form: 


7 =DK,. Ory _ 5[2 ox, | WU ay, , WU a2, |- eu 2.18) 


r 0q, t \ Ox, Og; Oy, Og; Oz; Oq, Oq 


j 
Since we assume that U depends only on particle coordinates (and possibly time), but not velocities: 
0U /0q, =9, with the substitution of Eq. (18), the Lagrange equation (17) may be represented in the so- 


called canonical form: 


(2.19a) 


where L is the Lagrangian function (sometimes called just the “Lagrangian”), defined as 


(It is crucial to distinguish this function from the mechanical energy (1.26), E = 7+ U.) 

Note also that according to Eq. (2.18), for a system under the effect of an additional generalized 
external force 4(t) we have to use, in all these relations, not the internal potential energy U" of the 
system, but its Gibbs potential energy U= U“™ #q; —see the discussion in Sec. 1.4. 


Using the Lagrangian approach in practice, the reader should always remember, first, that each 
system has only one Lagrange function (19b), but is described by J =1 Lagrange equations (19a), with 7 
taking values 1, 2,..., J, and second, that differentiating the function ZL, we have to consider the 
generalized velocities as its independent arguments, ignoring the fact they are actually the time 
derivatives of the generalized coordinates. 


2.2. Three simple examples 
As the first, simplest example, consider a particle constrained to move along one axis (say, x): 


f= ae U =U(x,?). (2.20) 


In this case, it is natural to consider x as the (only) generalized coordinate, and x as the generalized 
velocity, so that 


b=T-U=7% ~U(x,t). (2.21) 


Considering x and x as independent variables, we get 0L/0x =mx, and OL / 0x =—0U / 0x, so that Eq. 
(19) (the only Lagrange equation in this case of the single degree of freedom!) yields 


8 Note that due to the possible time dependence of U, Eq. (17) does not mean that the forces F;, have to be 
conservative — see the next section for more discussion. With this understanding, I will still use for function U the 
convenient name of “potential energy”. 
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dt Ox 


evidently the same result as the x-component of the 2"’ Newton’s law with F, = —OU/Ax. This example is 
a good sanity check, but it also shows that the Lagrange formalism does not provide too much advantage 
in this particular case. 


© (mi) aw )=0 (2.22) 


Such an advantage is, however, evident in our testbed problem — see Fig. 1. Indeed, taking the 
polar angle @ for the (only) generalized coordinate, we see that in this case, the kinetic energy depends 
not only on the generalized velocity but also on the generalized coordinate:° 


cs of (6° +@° sin” 0), U =—mgz + const = —mgR cos 6 + const, 
(2.23) 
L=T-U= R68" +@° sin” 0)+ mgR cos @ + const. 


Here it is especially important to remember that at substantiating the Lagrange equation, 9 and @ have 
to be treated as independent arguments of L, so that 


ee = mR°6, a =mR’@’ sin @ cos 6 — mgRsin O, (2.24) 
00 0 
giving us the following equation of motion: 
d WN 2 2: 2 . oe 
a RO )- (mr @° sin @cos@—mgRsin @)= 0. (2.25) 


As a sanity check, at w= 0, Eq. (25) is reduced to the equation (1.18) of the usual pendulum: 
1/2 
6+0?sin@=0, where Q= [£) (2.26) 


We will explore Eq. (25) in more detail later, but please note how simple its derivation was — in 
comparison with writing the 3D Newton’s law and then excluding the reaction force. 


Next, though the Lagrangian formalism was derived from Newton’s law for mechanical systems, 
the resulting equations (19) are applicable to other dynamic systems, especially those for which the 
kinetic and potential energies may be readily expressed via some generalized coordinates. As the 
simplest example, consider the well-known connection of a capacitor with capacitance C to an inductive 
coil with self-inductance “!° (Electrical engineers frequently call it the LC tank circuit.) 


eh 


Fig. 2.4. LC tank circuit. 


9 The above expression for T =(m/2)(x? + > +2) may be readily obtained either by the formal differentiation 
of Eq. (2) over time, or just by noticing that the velocity vector has two perpendicular components: one (of 
magnitude RO) along the ring, and another one (of magnitude @P = @R sin @ ) normal to the ring’s plane. 

10 A fancy font is used here to avoid any chance of confusion between the inductance and the Lagrange function. 
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As the reader (hopefully :-) knows from their undergraduate studies, at relatively low frequencies 
we may use the so-called lumped-circuit approximation, in which the total energy of this system is the 
sum of two components, the electric energy E,. localized inside the capacitor, and the magnetic energy 
Em localized inside the inductance coil: 

2 2 
E,= ge E.= an (2.27) 
20 2 
Since the electric current J through the coil and the electric charge QO on the capacitor are related by the 
charge continuity equation dQ/dt = I (evident from Fig. 4), it is natural to declare Q the generalized 
coordinate of the system, and the current, its generalized velocity. With this choice, the electrostatic 
energy E. (Q) may be treated as the potential energy U of the system, and the magnetic energy E,,(/), as 
its kinetic energy 7. With this attribution, we get 


CE Es =/%1=0, GE 3 Oi ; Oa: 2 (2.28) 
og =oal Og oO oq oO C 
so that the Lagrange equation (19) becomes 
d QO eee ee | 
—\|“QO)-| -=]|=0, ec. 0+—~-QO=0. 2.29 
4(v9)-[-2)-0, ie d+ 0 (2.29) 


Note, however, that the above choice of the generalized coordinate and velocity is not unique. 
Instead, one can use, as the generalized coordinate, the magnetic flux ® through the inductive coil, 
related to the common voltage V across the circuit (Fig. 4) by Faraday’s induction law V = —d®/dt. With 
this choice, (-V) becomes the generalized velocity, En = ©’/2Y should be understood as the potential 
energy, and E, = CV’/2 treated as the kinetic energy. For this choice, the resulting Lagrange equation of 
motion is equivalent to Eq. (29). If both parameters of the circuit, / and C, are constant in time, Eq. (29) 
describes sinusoidal oscillations with the frequency 

eae! 
OD, = (ay? Te (2.30) 

This is of course a well-known result, which may be derived in a more standard way — by 
equating the voltage drops across the capacitor (V = Q/C) and the inductor (V = —Ydl/dt = —Sad’ Oldr’). 
However, the Lagrangian approach is much more convenient for more complex systems — for example, 
for the general description of the electromagnetic field and its interaction with charged particles.!! 


2.3. Hamiltonian function and energy 


The canonical form (19) of the Lagrange equation has been derived using Eq. (18), which is 
formally similar to Eq. (1.22) for a potential force. Does this mean that the system described by Eq. (19) 
always conserves energy? Not necessarily, because the “potential energy” U that participates in Eq. 
(18), may depend not only on the generalized coordinates but on time as well. Let us start the analysis of 
this issue with the introduction of two new (and very important!) notions: the generalized momentum 
corresponding to each generalized coordinate q;, 


'l See, e.g., EM Secs. 9.7 and 9.8. 
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(2.31) 


and the Hamiltonian function"? 


(2.32) 


To see whether the Hamiltonian function is conserved during the motion, let us differentiate both 
sides of its definition (32) over time: 


dH d|oL|). oO, dL 
=>) S| |4, +4, |-—. (2.33) 
dt “F"| dt\ 0q, 04; dt 
If we want to make use of the Lagrange equation (19), the last derivative has to be calculated 
considering L as a function of independent arguments q,, q,, and ¢, so that 


dL OL OL OL 
> | qj 25, | ai (2.34) 


where the last term is the derivative of Z as an explicit function of time. We see that the last term in the 
square brackets of Eq. (33) immediately cancels with the last term in the parentheses of Eq. (34). 
Moreover, using the Lagrange equation (19a) for the first term in the square brackets of Eq. (33), we see 
that it cancels with the first term in the parentheses of Eq. (34). As a result, we arrive at a very simple 
and important result: 


= (2.35) 


The most important corollary of this formula is that if the Lagrangian function does not depend on time 
explicitly (OL / Ot = 0), the Hamiltonian function is an integral of motion: 
H =const. (2.36) 


Let us see how this works, using the first two examples discussed in the previous section. For a 
1D particle, the definition (31) of the generalized momentum yields 


P,=—=™m, (2.37) 


so that it coincides with the usual linear momentum — or rather with its x-component. According to Eq. 
(32), the Hamiltonian function for this case (with just one degree of freedom) is 


2 
H=p,v-L=p,2-|4¥-u|="40, (2.38) 
m 2 2m 


12 Tt is named after Sir William Rowan Hamilton, who developed his approach to analytical mechanics in 1833, 
on the basis of the Lagrangian mechanics. This function is sometimes called just the “Hamiltonian”, but it is 
advisable to use the full term “Hamiltonian function” in classical mechanics, to distinguish it from the 
Hamiltonian operator used in quantum mechanics, whose abbreviation to Hamiltonian is extremely common. 
(The relation of these two notions will be discussed in Sec. 10.1 below.) 
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i.e. coincides with the particle’s mechanical energy E = T+ U. Since the Lagrangian does not depend on 
time explicitly, both H and E are conserved. 


However, it is not always that simple! Indeed, let us return again to our testbed problem (Fig. 1). 
In this case, the generalized momentum corresponding to the generalized coordinate 0 is 
OL i 
=— = mR‘, 2.39 
Po a0 ( ) 
and Eq. (32) yields: 


H = p,O-L=mR°@’ -|2 Re +@° sin’ 0)+ mgR cos +const 
z (2.40) 
= oe (6° —@° sin’ 0)- mgR cos @ + const. 


This means that (as soon as w #0), the Hamiltonian function differs from the mechanical energy 
B=T+U=2R(6 +0 sin’ 0)— mgR cos 0 + const. (2.41) 


The difference, E — H = mR’a’sin’@ (besides an inconsequential constant), may change at bead’s 
motion along the ring, so that although H is an integral of motion (since OL/ot = 0), the energy is 
generally not conserved. 


In this context, let us find out when these two functions, E and H, do coincide. In mathematics, 
there is a notion of a homogeneous function f (x1, X2,...) of degree A, defined in the following way: for 
an arbitrary constant a, 


Ff (ax,,.aX,,...) = a? f (X, 4X55). (2.42) 


Such functions obey the following Euler theorem:'3 


ys ne (2.43) 
7 Ox; 
which may be simply proved by differentiating both parts of Eq. (42) over a and then setting this 
parameter to the particular value a = 1. Now, consider the case when the kinetic energy is a quadratic 
form of all generalized velocities q ,: 


i aC Ee ET (2.44) 
eu 


with no other terms. It is evident that such T satisfies the definition (42) of a homogeneous function of 
the velocities with / = 2,!4 so that the Euler theorem (43) gives 


ye paOr (2.45) 
: Og , 


J g 


13 This is just one of many theorems bearing the name of their author — the genius mathematician Leonhard Euler 
(1707-1783). 
14 Such functions are called guadratic-homogeneous. 
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But since U is independent of the generalized velocities, OL /0q, = 0T/0q,, and the left-hand side of 
Eq. (45) is exactly the first term in the definition (32) of the Hamiltonian function, so that in this case 


H =2T -L=27-(T-U)=T+U=E. (2.46) 


So, for a system with a kinetic energy of the type (44), for example, a free particle with T 
considered as a function of its Cartesian velocities, 


r= Ty 492 £92), (2.47) 


the notions of the Hamiltonian function and mechanical energy are identical. Indeed, some textbooks, 
very regrettably, do not distinguish these notions at all! However, as we have seen from our bead-on- 
the-rotating-ring example, these variables do not always coincide. For that problem, the kinetic energy, 
in addition to the term proportional to 67, has another, velocity-independent term — see the first of Eqs. 
(23) — and hence is not a quadratic-homogeneous function of the angular velocity, giving E # H. 

Thus, Eq. (36) expresses a new conservation law, generally different from that of mechanical 
energy conservation. 


2.4. Other conservation laws 


Looking at the Lagrange equation (19), we immediately see that if L = T— U is independent of 
some generalized coordinate g;, OL/Og; = 0,!° then the corresponding generalized momentum is an 
integral of motion:!¢ 


= eb: = const. (2.48) 
04; 


For example, for a 1D particle with the Lagrangian (21), the momentum p, is conserved if the potential 
energy is constant (and hence the x-component of force is zero) — of course. As a less obvious example, 
let us consider a 2D motion of a particle in the field of central forces. If we use polar coordinates 7 and 
gy in the role of generalized coordinates, then the Lagrangian function!” 


LeT-u=>(" +r°o)-U(r) (2.49) 
is independent of ¢, and hence the corresponding generalized momentum, 
OL ais 
=—=mr'o, 2.50 
Po =a ? (2.50) 


'S Such coordinates are frequently called cyclic, because in some cases (like g in Eq. (49) below) they represent 
periodic coordinates such as angles. However, this terminology is somewhat misleading, because some “cyclic” 
coordinates (e.g., x in our first example) have nothing to do with rotation. 

'6 This fact may be considered a particular case of a more general mathematical statement called the Noether 
theorem — named after its author, Emmy Nother, sometimes called the “greatest woman mathematician ever 
lived”. Unfortunately, because of time/space restrictions, for its discussion I have to refer the interested reader 
elsewhere — for example to Sec. 13.7 in H. Goldstein et al., Classical Mechanics, 3™ ed. Addison Wesley, 2002. 


'7 Note that here 7” is the square of the scalar derivative 7, rather than the square of the vector f= v. 
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is conserved. This is just a particular (2D) case of the angular momentum conservation — see Eq. (1.24). 
Indeed, for the 2D motion within the [x, y] plane, the angular momentum vector, 


n, mn, n, 
L=rxp=|x y zi, (2.51) 
mx my mz 
has only one component different from zero, namely the component normal to the motion plane: 
L, =x(my)- y(mx). (2.52) 
Differentiating the well-known relations between the polar and Cartesian coordinates, 
x =rcos@, y=rsing, (2.53) 
over time, and plugging the result into Eq. (52), we see that 


L, =mr’p= p,. (2.54) 


Thus the Lagrangian formalism provides a powerful way of searching for non-evident integrals 
of motion. On the other hand, if such a conserved quantity is obvious or known a priori, it is helpful for 
the selection of the most appropriate generalized coordinates, giving the simplest Lagrange equations. 
For example, in the last problem, if we knew in advance that p, had to be conserved, this could provide 
sufficient motivation for using the angle gas one of the generalized coordinates. 


2.5. Exercise problems 


In each of Problems 2.1-2.11, for the given system: 


(i) introduce a convenient set of generalized coordinates q;, 

(ii) write down the Lagrangian L as a function of q,,q,, and (if appropriate) time, 
(111) write down the Lagrange equation(s) of motion, 

(iv) calculate the Hamiltonian function H; find out whether it is conserved, 

(v) calculate the mechanical energy E; is E = H?; is the energy conserved? 

(vi) any other evident integrals of motion? 


2.1. A double pendulum — see the figure on the right. Consider only the motion 
within the vertical plane containing the suspension point. 


2.2. A stretchable pendulum (i.e. a massive particle hung on an elastic cord that 


exerts force F = —x(/ — 1p), where « and /p are positive constants), also confined to the 
vertical plane: 
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xo(2) 


2.3. A fixed-length pendulum hanging from a horizontal support whose motion law “~~ 
xo(t) 1s fixed. (No vertical plane constraint here.) 


2.4. A pendulum of mass m, hung on another point mass m’ that may slide, without 
friction, along a straight horizontal rail — see the figure on the right. The motion is confined 
to the vertical plane that contains the rail. 


2.5. A point-mass pendulum of length /, attached to the rim of a disk of 
radius R, that is rotated in a vertical plane with a constant angular velocity @— 
see the figure on the right. (Consider only the motion within the disk’s plane.) 


ls 


2.6. A bead of mass m, sliding without friction along a light Id 
string stretched by a fixed force Y between two horizontally g NN———-N g 
displaced points — see the figure on the right. Here, in contrast to the 
similar Problem 1.10, the string’s tension 4 may be comparable “YY KR 
with the bead’s weight mg, and the motion is not restricted to the g m 


vertical plane. 


2.7. A bead of mass m, sliding without friction along a light string of a ‘ 5 
fixed length 2/, that is hung between two points displaced horizontally by 
distance 2d < 2] — see the figure on the right. As in the previous problem, the S | 2! 
g 


motion is not restricted to the vertical plane. m 


move, also without friction, along a horizontal surface — see the figure on the 
right. (Both motions are within the vertical plane containing the steepest slope 
line.) 


2.8. A block of mass m that can slide, without friction, along the Gim 
inclined plane surface of a heavy wedge with mass m’. The wedge is free to | Ep. 
aN 


2.9. The two-pendula system that was the subject of Problem 1.8 — see l l 
the figure on the right. 
m m 
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2.10. A system of two similar, inductively-coupled LC circuits — 
see the figure on the right. 


2.11." A small Josephson junction — the system consisting of two 
superconductors (S) weakly coupled by Cooper-pair tunneling through a £,,C 
thin insulating layer (1) that separates them — see the figure on the right. 


Hints: 


(i) At not very high frequencies (whose quantum fia is lower than the binding energy 2A of the 
Cooper pairs), the Josephson effect in a sufficiently small junction may be described by the following 
coupling energy: 

U(9) =—E, cosy + const , 


where the constant E; describes the coupling strength, while the variable @ (called the Josephson phase 

difference) is connected to the voltage V across the junction by the famous frequency-to-voltage relation 
EO eay 
dt h 


where e = 1.602x10'” C is the fundamental electric charge and f ~ 1.054x10°* J-s is the Planck 
constant. !8 


> 


(11) The junction (as any system of two close conductors) has a substantial electric capacitance C. 


18 More discussion of the Josephson effect and the physical sense of the variable g may be found, for example, in 
EM Sec. 6.5 and QM Secs. 1.6 and 2.8 of this series, but the given problem may be solved without that additional 
information. 
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Chapter 3. A Few Simple Problems 


The objective of this chapter is to solve a few simple but very important particle dynamics problems that 
may be reduced to 1D motion. They notably include the famous “planetary” problem of two particles 
interacting via a spherically-symmetric potential, and the classical particle scattering problem. In the 
process of solution, several methods that will be very essential for the analysis of more complex systems 
are also discussed. 


3.1. One-dimensional and 1D-reducible systems 


If a particle is confined to motion along a straight line (say, axis x), its position is completely 
determined by this coordinate. In this case, as we already know, the particle’s Lagrangian function is 
given by Eq. (2.21): 

L=T(x)-U(x,0), T(x)= a (3.1) 


so that the Lagrange equation of motion given by Eq. (2.22), 


_ OU (x,t) 


7 (3.2) 


mx = 


is just the x-component of the 2"’ Newton’s law. 


It is convenient to discuss the dynamics of such really-1D systems as a part of a more general 
class of effectively-1D systems. This is a system whose position, due to either holonomic constraints 
and/or conservation laws, is also fully determined by one generalized coordinate g, and whose 
Lagrangian may be represented in a form similar to Eq. (1): 


L=T(G)-Ug(qt), = Te 4 (3.3) 


where mer iS some constant which may be considered as the effective mass of the system, and the 
function U¢s, its effective potential energy. In this case, the Lagrange equation (2.19), describing the 
system’s dynamics, has a form similar to Eq. (2): 


m G __ UG 


3.4 
ef aq ( ) 


As an example, let us return to our testbed system shown in Fig. 2.1. We have already seen that 
for this system, having one degree of freedom, the genuine kinetic energy 7, expressed by the first of 
Eqs. (2.23), is not a quadratically-homogeneous function of the generalized velocity. However, the 
system’s Lagrangian function (2.23) still may be represented in the form (3), 

M yn242_,™ p22. 2 
rts 0 va ow sin” 0+mgRcos@+const =T,, -U,,, (3.5) 


provided that we take 
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T= Re. U2 =- Ro sin’ 0 — mgR cos 6 + const. (3.6) 


In this new partitioning of the function Z, which is legitimate because U.s depends only on the 
generalized coordinate @, but not on the corresponding generalized velocity, Te» includes only a part of 
the genuine kinetic energy T of the bead, while U.s includes not only its real potential energy U in the 
gravity field but also an additional term related to ring rotation. (As we will see in Sec. 4.6, this term 
may be interpreted as the effective potential energy due to the inertial centrifugal “force” arising at the 
problem’s solution in the non-inertial reference frame rotating with the ring.) 


Returning to the general case of effectively-1D systems with Lagrangians of the type (3), let us 
calculate their Hamiltonian function, using its definition (2.32): 
OL 


H =~ G-L= mgd? — (Tg —U ng) = Tg tU ey. (3.7) 
q 


So, H is expressed via T-¢ and Ur exactly as the energy E is expressed via genuine 7 and U. 


3.2. Equilibrium and stability 


Autonomous systems are defined as dynamic systems whose equations of motion do not depend 
on time explicitly. For the effectively-1D (and in particular the really-1D) systems obeying Eq. (4), this 
means that their function Ue, and hence the Lagrangian function (3) should not depend on time 
explicitly. According to Eqs. (2.35), in such systems, the Hamiltonian function (7), i.e. the sum Ter + Ver, 
is an integral of motion. However, be careful! Generally, this conclusion is not valid for the genuine 
mechanical energy F of such a system; for example, as we already know from Sec. 2.2, for our testbed 
problem, with the generalized coordinate q = @ (Fig. 2.1), E is not conserved. 


According to Eq. (4), an autonomous system, at appropriate initial conditions, may stay in 
equilibrium at one or several stationary (alternatively called fixed) points q,, corresponding to either the 
minimum or a maximum of the effective potential energy (see Fig. 1): 


Fixed-point 


—_@ = 0. (3.8) condition 


Fig. 3.1. An example of the effective 
potential energy profile near stable (go, q2) 
and unstable (q) fixed points, and its 
quadratic approximation (10) near point qo. 


In order to explore the stability of such fixed points, let us analyze the dynamics of small 
deviations 


qn=qO-4, (3.9) 


from one of such points. For that, let us expand the function U.q) in the Taylor series at g,: 
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2 
ef 


ee Go is: 3.10 
2 ay (4,4 (3.10) 


dU xs 
U..(q) =U 4 (9,) +—*(4,,) 9 
dq 


The first term on the right-hand side, Uerqn), is an arbitrary constant and does not affect motion. The 
next term, linear in the deviation g , equals zero — see the fixed point’s definition (8). Hence the fixed 


point’s stability is determined by the next term, quadratic in g , more exactly by its coefficient, 
aU. 
Ker = 2 
dq 


(q,); (3.11) 


which is frequently called the effective spring constant. Indeed, neglecting the higher terms of the 
Taylor expansion (10),! we see that Eq. (4) takes the familiar form: 


Me G + KG = 0. (3.12) 


I am confident that the reader of these notes knows everything about this equation, but since we 
will soon run into similar but more complex equations, let us review the formal procedure of its 
solution. From the mathematical standpoint, Eq. (12) is an ordinary linear differential equation of the 
second order, with constant coefficients. The general theory of such equations tells us that its general 
solution (for any initial conditions) may be represented as 


Gt) =c,e"*' +c_.e*', (3.13) 
where the constants c+ are determined by initial conditions, while the so-called characteristic exponents 
Ax are completely defined by the equation itself. To calculate these exponents, it is sufficient to plug just 
one partial solution, e“", into the equation. In our simple case (12), this yields the following 
characteristic equation: 

Meh +k =O. (3.14) 


If the ratio kep/mer is positive, i.e. the fixed point corresponds to the minimum of potential energy 
(e.g., See points go and q> in Fig. 1), the characteristic equation yields 


1/2 
A, =tia,, with a, -(£| (3.15) 


ef 


(where i is the imaginary unit, i? =—1), so that Eq. (13) describes harmonic (sinusoidal) oscillations of 
the system,” 


+1@ot —1O ot _ 


q(t) =c,e +c_e =C,COS@t+c, sin @f, (3.16) 


! Those terms may be important only in very special cases then Ks is exactly zero, i.e. when a fixed point is also 
an inflection point of the function U.;(q). 

2 The reader should not be scared of the first form of (16), ie. of the representation of a real variable (the 
deviation from equilibrium) via a sum of two complex functions. Indeed, any real initial conditions give c_* = c,, 
so that the sum is real for any ¢. An even simpler way to deal with such complex representations of real functions 
will be discussed in the beginning of Chapter 5, and then used throughout this series. 
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with the frequency @, about the fixed point — which is thereby stable.? On the other hand, at the 
potential energy maximum (ker < 0, e.g., at point g; in Fig. 1), we get 


Kee 


1/2 
A, =+A, where A -( , sothat g(t)=c,e* Be eee, (3.17) 


MN 
Since the solution has an exponentially growing part,’ the fixed point is unstable. 


Note that the quadratic expansion of function U.q), given by the truncation of Eq. (10) to the 
three displayed terms, is equivalent to a /inear Taylor expansion of the effective force: 


dU a 
i ad Ke, (3.18) 
dq 


. 
TT 


immediately resulting in the linear equation (12). Hence, to analyze the stability of a fixed point q,, it is 
sufficient to linearize the equation of motion with respect to small deviations from the point, and study 
possible solutions of the resulting linear equation. This linearization procedure is typically simpler to 
carry out than the quadratic expansion (10). 


As an example, let us return to our testbed problem (Fig. 2.1) whose function Up we already 
know — see the second of Eqs. (6). With it, the equation of motion (4) becomes 


mR°6 =— aU es 
dé 


= mR?(@? cos0-Q?) sind, ie. O= (a cosO -9°) sin 0, (3.19) 


where Q = (g/R)” is the frequency of small oscillations of the system at w= 0 — see Eq. (2.26).5 From 
Eq. (8), we see that on any 2z-long segment of the angle 6, ® the system may have four fixed points; for 
example, on the half-open segment (-z, +] these points are 
Q? 
0,0; “OS% ~O,¢=4008 — (3.20) 
a) 
The last two fixed points, corresponding to the bead shifted to either side of the rotating ring, exist only 
if the angular velocity @ of the rotation exceeds Q. (In the limit of very fast rotation, @ >> Q, Eq. (20) 
yields &3 — +7/2, i.e. the stationary positions approach the horizontal diameter of the ring — in 
accordance with our physical intuition.) 


To analyze the fixed point stability, we may again use Eq. (9), in the form 0 =@, + 6, plug it 


into Eq. (19), and Taylor-expand both trigonometric functions of 6 up to the term linear in 6: 


re lo? (cosa, —sin8, @)-27] (sina, +cos0, a). (3.21) 


3 This particular type of stability, when the deviation from the equilibrium oscillates with a constant amplitude, 
neither growing nor decreasing in time, is called either the orbital, or “neutral”, or “indifferent” stability. 

4 Mathematically, the growing part vanishes at some special (exact) initial conditions which give c, = 0. However, 
the futility of this argument for real physical systems should be obvious to anybody who has ever tried to balance 
a pencil on its sharp point. 

5 Note that Eq. (19) coincides with Eq. (2.25). This is a good sanity check illustrating that the procedure (5)-(6) of 
moving a term from the potential to the kinetic energy within the Lagrangian function 1s indeed legitimate. 

6 For this particular problem, the values of @ that differ by a multiple of 2, are physically equivalent. 
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Generally, this equation may be linearized further by purging its right-hand side of the term proportional 
to 67; however in this simple case, Eq. (21) is already convenient for analysis. In particular, for the 
fixed point & = 0 (corresponding to the bead’s position at the bottom of the ring), we have cos & = | 
and sin@ = 0, so that Eq. (21) is reduced to a linear differential equation 


6 =(0? -2°)6, (3.22) 
whose characteristic equation is similar to Eq. (14) and yields 
v=o -Q’, fordx96,. (3.23a) 


This result shows that if @ <Q’, both roots A are imaginary, so that this fixed point is orbitally stable. 
However, if the rotation speed is increased so that Q? < a’, the roots become real: A = +(@ — Q’)!”, 
with one of them positive, so that the fixed point becomes unstable beyond this threshold, i.e. as soon as 
fixed points @3 exist. Absolutely similar calculations for other fixed points yield 


, |2’?+o°>0, ford~@,, 
2 = (3.23b) 


Y -a@’, for 0 ~ 0,,. 


These results show that the fixed point 6 (the bead on the top of the ring) is always unstable — just as 
we could foresee, while the side fixed points @;3 are orbitally stable as soon as they exist — at Q’ < @. 


Thus, our fixed-point analysis may be summarized very simply: an increase of the ring rotation 
speed w beyond a certain threshold value, equal to QO given by Eq. (2.26), causes the bead to move to 
one of the ring sides, oscillating about one of the fixed points @,3. Together with the rotation about the 
vertical axis, this motion yields quite a complex (generally, open) spatial trajectory as observed from a 
lab frame, so it is fascinating that we could analyze it quantitatively in such a simple way. 


Later in this course, we will repeatedly use the linearization of the equations of motion for the 
analysis of the stability of more complex dynamic systems, including those with energy dissipation. 


3.3. Hamiltonian 1D systems 


Autonomous systems that are described by time-independent Lagrangians are frequently called 
Hamiltonian ones because their Hamiltonian function H (again, not necessarily equal to the genuine 
mechanical energy £!) is conserved. In our current 1D case, described by Eq. (3), 


H= a +U,.(q) = const . (3.24) 


From the mathematical standpoint, this conservation law is just the first integral of motion. Solving Eq. 
(24) for g, we get the first-order differential equation, 


dq 2 oe m,\ dq 
—+=+,—_|H-U > eS) a a, 3.25 
dt [2 [ ef «| [ 9 [H =U (q)|? ( ) 


which may be readily integrated: 
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1/2 q(t) 
d t 
[7 { — =n. (3.26) 
git, LH - Ur (@)] 


Since the constant H (as well as the proper sign before the integral — see below) is fixed by initial 
conditions, Eq. (26) gives the reciprocal form, t = ¢(qg), of the desired law of system motion, q(t). Of 
course, for any particular problem the integral in Eq. (26) still has to be worked out, either analytically 
or numerically, but even the latter procedure is typically much easier than the numerical integration of 
the initial, second-order differential equation of motion, because at the addition of many values (to 
which any numerical integration is reduced’) the rounding errors are effectively averaged out. 


Moreover, Eq. (25) also allows a general classification of 1D system motion. Indeed: 


(i) If H > U.q) in the whole range of our interest, the effective kinetic energy Tes (3) 1s always 
positive. Hence the derivative dq/dt cannot change its sign, so that this effective velocity retains the sign 
it had initially. This is an unbound motion in one direction (Fig. 2a). 


(a) (b) 


-2 
-1 0 1 2 
O/x 
Fig. 3.2. Graphical representations of Eq. (25) for three different cases: (a) an unbound motion, with the 
velocity sign conserved, (b) a reflection from a “classical turning point”, accompanied by the velocity 
sign change, and (c) bound, periodic motion between two turning points — schematically. (d) The 
effective potential energy (6) of the bead on the rotating ring (Fig. 2.1) for a particular case with Q’ < a. 


Ww 


(ii) Now let the particle approach a classical turning point A where H = U.q) — see Fig. 2b.8 
According to Eq. (25), at that point the particle velocity vanishes, while its acceleration, according to 
Eq. (4), is still finite. This means that the particle’s velocity sign changes its sign at this point, i.e. it is 
reflected from it. 


7 See, e.g., MA Eqs. (5.2) and (5.3). 
8 This terminology comes from quantum mechanics, which shows that a particle (or rather its wavefunction) 
actually can, to a certain extent, penetrate “classically forbidden” regions where H < U.(q). 


Chapter 3 Page 6 of 22 


Oscillation 
period 


Essential Graduate Physics CM: Classical Mechanics 


(111) If, after the reflection from some point A, the particle runs into another classical turning 
point B (Fig. 2c), the reflection process is repeated again and again, so that the particle is bound to a 
periodic motion between two turning points. 


The last case of periodic oscillations presents a large conceptual and practical interest, and the 
whole Chapter 5 will be devoted to a detailed analysis of this phenomenon and numerous associated 
effects. Here I will only note that for an autonomous Hamiltonian system described by Eq. (4), Eq. (26) 
immediately enables the calculation of the oscillation period: 


fa8 may | dq 37 
[ 2 l.or eH) 


where the additional front factor 2 accounts for two time intervals: of the motion from B to A and back — 
see Fig. 2c. Indeed, according to Eq. (25), at each classically allowed point q, the velocity’s magnitude 
is the same, so these time intervals are equal to each other. 


(Note that the dependence of points A and B on H is not necessarily continuous. For example, for 
our testbed problem, whose effective potential energy is plotted in Fig. 2d for a particular value of @> 
Q, a gradual increase of H leads to a sudden jump, at H = M, of the point B to a new position B’, 
corresponding to a sudden switch from oscillations about one fixed point @3 to oscillations about two 
adjacent fixed points — before the beginning of a persistent rotation around the ring at H > A.) 


Now let us consider a particular, but a very important limit of Eq. (27). As Fig. 2c shows, if H is 
reduced to approach Uyin, the periodic oscillations take place at the very bottom of this potential well, 
about a stable fixed point go. Hence, if the potential energy profile is smooth enough, we may limit the 
Taylor expansion (10) to the displayed quadratic term. Plugging it into Eq. (27), and using the mirror 
symmetry of this particular problem about the fixed point go, we get 


1/2 4 i ; 
C= = “ = : ; ith J= =— , . 
| 2 heme + KG? /2){” wee liep (3.28) 


where €=@/A, with A = (2/kep)'*(H— Umin)'” being the classical turning point, i.e. the oscillation 


min 


amplitude, and @p the frequency given by Eq. (15). Taking into account that the elementary integral / in 
that equation equals 7/2,° we finally get 


_2n 


T (3.29) 


My 


as it should be for the harmonic oscillations (16). Note that the oscillation period does not depend on the 
oscillation amplitude A, i.e. on the difference (H — Umin) — while it is sufficiently small. 


3.4. Planetary problems 


Leaving a more detailed study of oscillations for Chapter 5, let us now discuss the so-called 
planetary systems!” whose description, somewhat surprisingly, may be also reduced to an effectively 1D 


9 Indeed, introducing a new variable ¢ as = sin G we get d= cos €d€= (1-€&)'” dG so that the function under 
the integral is just dc, and its limits are € = 0 and ¢= 772. 
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problem. Indeed, consider two particles that interact via a conservative central force F2; = —F). = n,F(r), 
where 7 and n, are, respectively, the magnitude and the direction of the distance vector r = Yr; — Y2 
connecting the two particles (Fig. 3). 


Fig. 3.3. Vectors in the planetary problem. 


Generally, two particles moving without constraints in 3D space, have 3 + 3 = 6 degrees of 
freedom, which may be described, e.g., by their Cartesian coordinates {x1, 1, 21, x2, ¥2, Z2} However, for 
this particular form of interaction, the following series of tricks allows the number of essential degrees 
of freedom to be reduced to just one. 


First, the conservative force of particle interaction may be described by a time-independent 
potential energy U(r), such that F(r) = —0U(r)/or.!! Hence the Lagrangian function of the system is 
M22 , M2 22 


L=T-U(r)= 5 E+ -U(). (3.30) 


Let us perform the transfer from the initial six scalar coordinates of the particles to the following six 
generalized coordinates: three Cartesian components of the distance vector 


r=fr,-fr2, (3.31) 


and three scalar components of the following vector: 


RoT Th with M=m,+m,, C2: fe 
M 
which defines the position of the center of mass of the system, with the total mass M. Solving the system 
of two linear equations (31) and (32) for r; and ro, we get 
r=R+“r, 1, =R-—"r. (3.33) 
M M 
Plugging these relations into Eq. (30), we see that it is reduced it to 
Pie ee. (3.34) 
2 2 
1. 4 1 
so that —=—+—. Coy 


mm Mm, 


!0 This name is very conditional, because this group of problems includes, for example, charged particle scattering 
(see Sec. 3.7 below). 
11 See, e.g., MA Eq. (10.8) with 0/00= d/dg= 0. 
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Note that according to Eq. (35), the reduced mass is lower than that of the lightest component of the 
two-body system. If one of m2 1s much less than its counterpart (like it is in most star-planet or planet- 
satellite systems), then with a good precision m ~ min [m, m2]. 


Since the Lagrangian function (34) depends only on R rather than R_ itself, according to our 
discussion in Sec. 2.4, all Cartesian components of R are cyclic coordinates, and the corresponding 
generalized momenta are conserved: 

OL 


P =o = MR, =const, j=1,2,3. (3.36) 


J 


Physically, this is just the conservation law for the full momentum P = MR of our system, due to the 
absence of external forces. Actually, in the axiomatics used in Sec. 1.3 this law is postulated — see Eq. 
(1.10) — but now we may attribute the momentum P to a certain geometric point, with the center-of-mass 
radius vector R. In particular, since according to Eq. (36) the center moves with a constant velocity in 
the inertial reference frame used to write Eq. (30), we may consider a new inertial frame with the origin 
at point R. In this new frame, R = 0, so that the vector r (and hence the scalar r) remain the same as in 
the old frame (because the frame transfer vector adds equally to r; and rz, and cancels in r = r, — ro), 
and the Lagrangian (34) is now reduced to 


Le ar ~U(r). (3.37) 


Thus our initial problem has been reduced to just three degrees of freedom — three scalar 
components of the vector r. In other words, Eq. (37) shows that the dynamics of the vector r of our 
initial, two-particle system is identical to that of the radius vector of a single particle with the effective 
mass m, moving in the central potential field U(r). 


Two more degrees of freedom may be excluded from the planetary problem by noticing that 
according to Eq. (1.35), the angular momentum L = rxp of our effective single particle of mass m is also 
conserved, both in magnitude and direction. Since the direction of L is, by its definition, perpendicular 
to both r and v = p/m, this means that the particle’s motion is confined to the plane whose orientation is 
determined by the initial directions of the vectors r and v. Hence we can completely describe the 
particle’s position by just two coordinates in that plane, for example by the distance r to the origin, and 
the polar angle g In these coordinates, Eq. (37) takes the form identical to Eq. (2.49): 


L = FH +r°p?)-U(r). (3.38) 


Moreover, the latter coordinate, polar angle g, may be also eliminated by using the conservation of 
angular momentum’s magnitude, in the form of Eq. (2.50): 


L, =mr’*@ =const. (3.39) 


A direct corollary of this conservation is the so-called 2” Kepler’s law:'3 the radius vector r 
sweeps equal areas A in equal time periods. Indeed, in the linear approximation in dA << A, the area 


!2 Here index z stands for the coordinate perpendicular to the motion plane. Since other components of the angular 
momentum equal zero, this index is not really necessary, but I will still use it — just to make a clear distinction 
between the angular momentum L, and the Lagrangian function L. 
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differential dA is equal to the area of a narrow right triangle with the base being the arc differential rdg, 
and the height equal to r — see Fig. 4. As a result, according to Eq. (39), the time derivative of the area, 


A VEO 2 pp (3.40) 
da dtt 


2m 
remains constant. Since the factor L./2m is constant, integration of this equation over an arbitrary (not 
necessarily small!) time interval Ar proves the 2" Kepler’s law: A « At. 


rdp 


dA 
Fig. 3.4. The area differential dA in 


the polar coordinates. 


0 


Now note that since 0L/ot = 0, the Hamiltonian function H is also conserved, and since, 
according to Eq. (38), the kinetic energy of the system is a quadratic-homogeneous function of the 
generalized velocities 7 and @, we have H = E, so that the system’s energy E, 


BaP 4 Pre? +U(N), (3.41) 
2 2 
is also the first integral of motion.'4 But according to Eq. (39), the second term on the right-hand side of 
Eq. (41) may be represented as 
mM >.> L 
—rgo =—., 3.42 
2 , 2mr° ( ) 


so that the energy (41) may be expressed as that of a 1D particle moving along axis 7, 


E=># +Un@), (3.43) 
in the following effective potential: 
: Effective 
= - tential 
(OOF ar GM) Son 


So the planetary motion problem has been reduced to the study of an effectively-1D system.!° 


13 This is one of the three laws deduced, from the extremely detailed astronomical data collected by Tycho Brahe 
(1546-1601), by Johannes Kepler in the early 17" century. In turn, the three Kepler’s laws have become the main 
basis for Newton’s discovery, a few decades later, of the gravity law (1.15). That relentless march of physics... 

'4 One may argue that this fact should have been evident earlier because the effective particle of mass m moves in 
a potential field U(r), which conserves energy. 

'5 Note that this reduction has been done in a way different from that used for our testbed problem (Fig. 
2.1) in Sec. 2 above. (The reader is encouraged to analyze this difference.) To emphasize this fact, I will 
keep writing F instead of H here, though for the planetary problem we are discussing now, these two 
notions coincide. 
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Now we may proceed just like we did in Sec. 3, with due respect to the very specific effective 
potential (44) which, in particular, diverges at r —+ 0 — besides the very special case of an exactly radial 
motion, L, = 0. In particular, we may solve Eq. (43) for dr/dt to get 


dt -(2) —_—— (3.45) 
2 [E-U<(r)] 


This equation allows us not only to get a direct relation between time ¢ and distance r, similarly to Eq. 


(26), 
1/2 1/2 
m dr m dr 
r= ) | 1/2 = ) } 2 2941/2 ? (3.46) 
Z [E-U,,(r)] 2 [E -—U(r)-L,/2mr°] 
but also do a similar calculation of the angle g of the effective particle. Indeed, integrating Eq. (39), 
L, ¢ dt 
g=|ea==[—, (3.47) 
m*r 


and plugging dt from Eq. (45), we get an explicit expression for the particle’s trajectory @ (r): 


L dr L dr 
= += =+— é 3.48 
QP (2m)? Ec (2m)\? le hee ( ) 


Note that according to Eq. (39), the derivative dg/dt does not change sign at the reflection from any 
classical turning point 7 # 0, so that, in contrast to Eq. (46), the sign on the right-hand side of Eq. (48) is 
uniquely determined by the initial conditions and cannot change during the motion. 


Let us use these results, valid for any interaction law U(r), for the planetary motion’s 
classification. (Following a good tradition, in what follows I will select the arbitrary constant in the 
potential energy in the way to provide U > 0 and hence U.-— 0, at r > ©.) The following cases should 
be distinguished. 


If U(r) < 0, i.e. the particle interaction is attractive (as it always is in the case of gravity), and the 
divergence of the attractive potential at r > 0 is faster than L/r’, then Ur) > —20 at r > 0, so that at 
appropriate initial conditions the particle may drop on the center even if L, # 0 — the event called the 
capture.'6 On the other hand, with U(r) either converging or diverging slower than 1/r°, at r > 0, the 
effective energy profile U.«7) has the shape shown schematically in Fig. 5. This is true, in particular, for 
the very important case 


U(r)=—%, with a>0, (3.49) 
r 


which describes, in particular, the Coulomb (electrostatic) interaction of two particles with electric 
charges of opposite signs, and Newton’s gravity law (1.15). This particular case will be analyzed in 
detail below, but for now let us return to the analysis of an arbitrary attractive potential U(r) < 0 leading 
to the effective potential shown in Fig. 5 when the angular-momentum term in Eq. (44) dominates at 
small distances r. 


16 Tn order to analyze what exactly happens at the capture, i.e. at y= 0, we would need a model more specific than 
Eq. (30). 
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U(r) 


; Fig. 3.5. Effective potential profile of an attractive 
it eo central field, and two types of motion in it. 


According to the analysis in Sec. 3, such potential profile, with a minimum at some distance 7», 
may sustain two types of motion, depending on the energy EF (determined by initial conditions): 


(1) If E > 0, there is only one classical turning point where E = U2, so that the distance r either 
grows with time from the very beginning or (if the initial value of 7 was negative) first decreases and 
then, after the reflection from the increasing potential U-,, starts to grow indefinitely. The latter case, of 
course, describes the scattering of the effective particle by the attractive center.!” 


(ii) On the opposite, if the energy is within the range 
Uj (%) SE <9, (3.50) 
the system moves periodically between two classical turning points 7min and 7max — see Fig. 5. These 


oscillations of the distance r correspond to the bound orbital motion of our effective particle about the 
attracting center. 


Let us start with the discussion of the bound motion, with the energy within the range (50). If the 
energy has its minimal possible value, 


E=U.(m%) = min[U g(r), (3.51) 


the distance cannot change, r = 7 = const, so that the particle’s orbit is circular, with the radius ro 
satisfying the condition dU./dr = 0. Using Eq. (44), we see that the condition for 79 may be written as 


Le du 
22s => 3.52 
mr, dr aa: ( ) 


Since at circular motion, the velocity v is perpendicular to the radius vector r, L, is just mrov, the left- 
hand side of Eq. (52) equals mv’/ro, while its right-hand side is just the magnitude of the attractive force, 
so that this equality expresses the well-known 2™ Newton’s law for the circular motion. Plugging this 
result into Eq. (47), we get a linear law of angle change, g = wf + const, with the angular velocity 


o=—L=—, (3.53) 


and hence the rotation period 7 y = 27/a obeys the elementary relation 


!7Tn the opposite case when the interaction is repulsive, U(r) > 0, the addition of the positive angular energy term 
only increases the trend, and the scattering scenario is the only one possible. 
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Te 271, . 


g 


(3.54) 


v 


Now let the energy be above its minimum value (but still negative). Using Eq. (46) just as in 
Sec. 3, we see that the distance r oscillates with the period 


iar 
m max dr 
7. =2/) — 3.55 
(=| J [E -U(r) — 22 /2mr?]!” a 


This period is not necessarily equal to another period, 7), that corresponds to the 2 z-change of the angle. 


Indeed, according to Eq. (48), the change of the angle @ between two sequential points of the nearest 
approach, 
. 
L max dr 
AQ =) ry 
Ae (2m)? J [EB -U(r) - LZ /2mr?}” 


(3.56) 


is generally different from 27. Hence, the general trajectory of the bound motion has a spiral shape — 
see, e.g., an illustration in Fig. 6. 


Fig. 3.6. A typical open orbit of a particle 
moving in a non-Coulomb central field. 


The situation is special, however, for a very important particular case, namely that of the 
Coulomb potential described by Eq. (49).!8 Indeed, plugging this potential into Eq. (48), we get 


= = J a 1/2 * 
(2m) r?(E+a/r—L2/2mr?) 


This is a table integral,'® giving 


(3.57) 


L. /mar-1 


gp = +cos* ——>—_—__— 
(1+2E22 /ma?)”” 


+ const. (3.58) 


18 For the power-law interaction, U oc r, the orbits are closed curves only if v=—1 (the Coulomb potential) or if v 
= +2 (the 3D harmonic oscillator) — the so-called Bertrand theorem, proved by J. Bertrand only in 1873. 
19 See, e.g., MA Eq. (6.3a). 
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Hence the reciprocal function, r(@), is 27-periodic: 


r= P (3.59) 
1+ ecos(g + const) 


so that at E < 0, the orbit is a closed line characterized by the following parameters:2° 


(3.60) 


The physical meaning of these parameters is very simple. Indeed, the general Eq. (52), in the 
Coulomb potential for which dU/dr = a/r’, shows that p is just the circular orbit radius?! for the given 
L.:r9=L/ma = p,so that 
a’m 
ae 


Zz 


min[U,,(r)]= Ue %) =- (3.61) 
Using this equality together with the second of Eqs. (60), we see that the parameter e (called the 
eccentricity) may be represented just as 


-{1-— 4 . (3.62) 
min[U; (7)] 


Analytical geometry tells us that Eq. (59), with e < 1, is one of the canonical representations of 
an ellipse, with one of its two focuses located at the origin. The fact that planets have such trajectories is 
known as the 1” Kepler’s law. Figure 7 shows the relations between the dimensions of the ellipse and 
the parameters p and e.”? 


y=rsing 


focus (one of the two) 


agree perihelion 


x=rcos@ 


Fig. 3.7. Ellipse, and its special 
points and dimensions. 


In particular, the major semi-axis a and the minor semi-axis b are simply related to p and e and 
hence, via Eqs. (60), to the motion integrals E and L,: 


P as P ee (3.63) 


a=—— = TT): — a re 


20 Let me hope that the difference between the parameter p and the particle momentum’s magnitude is absolutely 
clear from the context, so that using the same (traditional) notation for both notions cannot lead to confusion. 

21 Mathematicians prefer a more solemn terminology: the parameter 2p is called the /atus rectum of the ellipse. 

22 In this figure, the constant participating in Eqs. (58)-(59) is assumed to be zero. A different choice of the 
constant corresponds just to a different origin of g, i.e. a constant turn of the ellipse about the origin. 
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As was mentioned above, at E + min [U-.«7)] the orbit is almost circular, with r(@) = ro ~ p. On 
the contrary, as E is increased to approach zero (its maximum value for the closed orbit), then e > 1, so 


that the aphelion point /max = p/(1 — e) tends to infinity, i.e. the orbit becomes extremely extended — see 
the magenta lines in Fig. 8. 


(b) 


e= 41.00 20p 


Fig. 3.8. (a) Zoom-in and (b) zoom-out on the Coulomb- 
field trajectories corresponding to the same parameter p 
(i.e., the same Z,), but different values of the eccentricity 
parameter e, i.e. of the energy E — see Eq. (60): ellipses 
(e < 1, red lines), a parabola (e = 1, magenta line), and 
hyperbolas (e > 1, blue lines). Note that the transition 
from closed to open trajectories at e = 1 is dramatic only 
at very large distances, r >> p. 


The above relations enable, in particular, a ready calculation of the rotation period 7 = 7, = 7p. 
(In the case of a closed trajectory, 7; and 7, coincide.) Indeed, it is well known that the ellipse’s area A = 
nab. But according to the 2™ Kepler’s law (40), dA/dt = L./2m = const. Hence 


_ A _ zab 
dAldt L,/2m- 


(3.64a) 


Using Eqs. (60) and (63), this important result may be represented in several other forms: 


1/2 
a 1/2 
nes = xa | —~ = 2na°°(2) (3.64b) 
(=e)? /2m) | 218 a 


Since for the Newtonian gravity (1.15), @ = Gmim2 = GmM, at m, << mp (i.e. m << M) this 
constant is proportional to m, and the last form of Eq. (64b) yields the 3” Kepler’s law: periods of 
motion of different planets in the same central field, say that of our Sun, scale as 7 « a*”. Note that in 


contrast to the 2" Kepler’s law (which is valid for any central field), the 1 and the 3™ Kepler’s laws are 
potential-specific. 


Now reviewing the above derivation of Eqs. (59)-(60), we see that they are also valid in the case 
of E = 0 — see the top horizontal line in Fig. 5 and its discussion above, if we limit the results to the 
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physically meaningful range r = 0. This means that if the energy is exactly zero, Eq. (59) (with e=1) is 
still valid for all values of m (except for one special point @ = 2 where r becomes infinite) and 
describes a parabolic (i.e. open) trajectory — see the magenta lines in Fig. 8. 


Moreover, if E > 0, Eq. (59) is still valid within a certain sector of angles 9, 


1/2 
2EL? 
Ag =2cos" oe 2cos" t +—= ) <a, for E>0, (3.65) 
e ma 


and describes an open, hyperbolic trajectory (see the blue lines in Fig. 8). As was mentioned earlier, 
such trajectories are typical, in particular, for particle scattering. 


3.5. Elastic scattering 


If E > 0, the motion is unbound for any realistic interaction potential. In this case, the two most 
important parameters of the particle trajectory are the impact parameter b and the scattering angle 0 
(Fig. 9), and the main task of the theory is to find the relation between them in the given potential U(r). 


0=2-29, ("scattering angle") 


Fig. 3.9. Main geometric parameters of the scattering problem. 


For that, it is convenient to note that b is related to the two conserved quantities, the particle’s 
energy*? EF and its angular momentum L-, in a simple way. Indeed, at r >> b, the definition L = rx(mv) 
yields L,= bmv.., where Vo = (2E/m)'” is the initial (and hence the final) speed of the particle, so that 


L, =b(2mE)””. (3.66) 
Hence the angular contribution to the effective potential (44) may be represented as 
LC 2 
pak on (3.67) 
2mr r 


Next, according to Eq. (48), the trajectory sections going from infinity to the nearest approach 
point (7 = rmin) and from that point to infinity, have to be similar, and hence correspond to equal angle 
changes @ — see Fig. 9. Hence we may apply the general Eq. (48) to just one of the sections, say [/min, 
oo], to find the scattering angle: 


23 The energy conservation law is frequently emphasized by calling such process elastic scattering. 


Chapter 3 Page 16 of 22 


Differential 
cross- 
section 


Rutherford 
scattering 
formula 


Essential Graduate Physics CM: Classical Mechanics 


Leff dr T bdr 
@=n-20, =n -2 - =n7-2 (3.68) 
; (2m) J r?[E U(r) — 22 /2mr? |!” J rll-U(r)/E-b Ir? |” 


In particular, for the Coulomb potential (49), now with an arbitrary sign of @, we can use the same table 
integral as in the previous section to get*4 


isla=te nd (3.69a) 
[1+ (a/2Eb)’ | 
This result may be more conveniently rewritten as 
lo] _ |e 
tan = ; (3.69b) 
2 2Eb 


Very clearly, the scattering angle’s magnitude increases with the potential strength @, and decreases as 
either the particle energy or the impact parameter (or both) are increased. 


The general result (68) and the Coulomb-specific relations (69) represent a formally complete 
solution of the scattering problem. However, in a typical experiment on elementary particle scattering, 
the impact parameter b of a single particle is unknown. In this case, our results may be used to obtain the 
statistics of the scattering angle @, in particular, the so-called differential cross-section?» 


ACCEL 
dQ ndQ’ 
where n is the average number of the incident particles per unit area, and dN is the average number of 
the particles scattered into a small solid angle interval dQ. For a uniform beam of initial particles, 


do/dQ may be calculated by counting the average number of incident particles that have the impact 
parameters within a small range db: 


(3.70) 


dN =n2nbdb. (3.71) 


Scattered by a spherically-symmetric center, which provides an axially-symmetric scattering pattern, 
these particles are scattered into the corresponding small solid angle interval dQ = 27|sin@ dé |. 
Plugging these two equalities into Eq. (70), we get the following general geometric relation: 
db 


BO ph 
sin 0d 


dQ. 


(3.72) 


In particular, for the Coulomb potential (49), a straightforward differentiation of Eq. (69) yields 
the so-called Rutherford scattering formula (reportedly derived by R. H. Fowler): 


2 
“(4 a G73) 
dQ \4E) sin*(6/2) 


24 Alternatively, this result may be recovered directly from the first form of Eq. (65), with the eccentricity e 
expressed via the same dimensionless parameter (2Eb/a): e = [1 + (2Eb/a)"]'? > 1. 

25 This terminology stems from the fact that an integral (74) of do/dQ over the full solid angle, called the total 
cross-section o, has the dimension of the area: o= N/n, where N is the total number of scattered particles. 
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This result, which shows very strong scattering to small angles (so strong that the integral that 
expresses the total cross-section 


(3.74) 


is diverging at 9 + 0),2° and very weak backscattering (to angles 9 = 7) was historically extremely 
significant: in the early 1910s, its good agreement with q@-particle scattering experiments carried out by 
Ernest Rutherford’s group gave a strong justification for his introduction of the planetary model of 
atoms, with electrons moving around very small nuclei — just as planets move around stars. 


Note that elementary particle scattering is frequently accompanied by electromagnetic radiation 
and/or other processes leading to the loss of the initial mechanical energy of the system. Such inelastic 
scattering may give significantly different results. (In particular, the capture of an incoming particle 
becomes possible even for a Coulomb attracting center.) Also, quantum-mechanical effects may be 
important at the scattering of light particles with relatively low energies,?’ so that the above results 
should be used with caution. 


3.6. Exercise problems 


2d 
3.1. For the system considered in Problem 2.6 (a bead sliding ¢  SWs—_—— TF 


along a string with fixed tension Y, see the figure on the right), 
analyze small oscillations of the bead near the equilibrium. NS 
g m 
3.2. For the system considered in Problem 2.7 (a bead sliding 


along a string of a fixed length 2/, see the figure on the right), analyze small 
oscillations near the equilibrium. 


3.3. A bead is allowed to slide, without friction, along an 
inverted cycloid in a vertical plane — see the figure on the right. 
Calculate the frequency of its free oscillations as a function of 
their amplitude. 


2n 


3.4. For a 1D particle of mass m, placed into potential U(q) = aq” (where a> 0, and n is a 
positive integer), calculate the functional dependence of the particle oscillation period 7 on its energy E. 


Explore the limit n > , 


3.5. Two small masses m, and m2 < m, may slide, without friction, over a horizontal surface. 
They are connected with a spring with equilibrium length / and elastic constant «, and at ¢ < 0 are at rest. 


26 This divergence, which persists at the quantum-mechanical treatment of the problem (see, e.g., QM Chapter 3), 
is due to particles with very large values of b, and disappears at an account, for example, of any non-zero 
concentration of the scattering centers. 

27 Their discussion may be found in QM Secs. 3.3 and 3.8. 
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At t = 0, the first of these masses receives a very short kick with impulse P = JF(#dt in the direction 
normal to the spring line. Calculate the largest and smallest magnitude of its velocity at t> 0. 


3.6. Explain why the term mr*@’ /2, recast in accordance with Eq. (42), cannot be merged with 
U(r) in Eq. (38), to form an effective 1D potential energy U(r) — L.’/2mr*, with the second term’s sign 
opposite to that given by Eq. (44). We have done an apparently similar thing for our testbed bead-on- 
rotating-ring problem at the very end of Sec. 1 — see Eq. (6); why cannot the same trick work for the 
planetary problem? Besides a formal explanation, discuss the physics behind this difference. 


3.7. A system consisting of two equal masses m on a light rod of length / 
(frequently called a dumbbell) can slide without friction along a vertical ring of 
radius R, rotated about its vertical diameter with a constant angular velocity w— see 
the figure on the right. Derive the condition of stability of the lower horizontal 
position of the dumbbell. 


3.8. Analyze the dynamics of the so-called spherical pendulum — a point 
mass hung, in a uniform gravity field g, on a light cord of length /, with no motion’s confinement to a 
vertical plane. In particular: 


(1) find the integrals of motion and reduce the problem to a 1D one, 
(11) calculate the time period of the possible circular motion around the vertical axis, and 
(iii) explore small deviations from the circular motion. (Are the pendulum orbits closed?)28 


3.9. If our planet Earth were suddenly stopped on its orbit around Sun, how long would it take it 
to fall on our star? Solve this problem using two different approaches, while neglecting the Earth’s orbit 
eccentricity and the Sun’s size. 


3.10. The orbits of Mars and Earth around the Sun may be well approximated as circles,?? with a 
radii ratio of 3/2. Use this fact, and the Earth’s year duration (which you should know :-), to calculate 
the time of travel to Mars spending the least energy on the spacecraft’s launch. Neglect the planets! size 
and the effects of their gravitational fields. 


3.11. Derive first-order and second-order differential equations for the reciprocal distance u = 1/r 
as a function of g, describing the trajectory of the particle’s motion in a central potential U(r). Spell out 
the latter equation for the particular case of the Coulomb potential (49) and discuss the result. 


3.12. For the motion of a particle in the Coulomb attractive field (U(r) = —a/r, with a > 0), 


calculate and sketch the so-called hodograph° — the trajectory followed by the head of the velocity 
vector v, provided that its tail is kept at the origin. 


28 Solving this problem is very good preparation for the analysis of the symmetric top’s rotation in Sec. 4.5. 

29 Indeed, their eccentricities are close to, respectively, 0.093 and 0.0167. 

30 The use of this notion for the characterization of motion may be traced back at least to an 1846 treatise by W. 
Hamilton. Nowadays, it is most often used in applied fluid mechanics, in particular meteorology. 
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3.13. Prove that for arbitrary motion of a particle of mass m in the Coulomb field U = —a/r, 
vector A = pxL — man, is conserved.?! After that: 


(1) spell out the scalar product r-A and use the result for an alternative derivation of Eq. (59), and 
for a geometric interpretation of the vector A; 

(ii) spell out (A—pxL)’, and use the result for an alternative derivation of the hodograph 
diagram discussed in the previous problem. 


3.14. For motion in the following central potential: 


0 eee 
fo # 
(i) find the orbit r(@), for positive a and /, and all possible ranges of energy E; 
(ii) prove that in the limit # > 0, and for energy EF < 0, the orbit may be represented as a slowly 
rotating ellipse; 
(111) express the angular velocity of this slow orbit rotation via the parameters @ and f/, the 
particle’s mass m, its energy £, and the angular momentum L,,. 


3.15. A star system contains a much smaller planet and an even much smaller amount of dust. 
Assuming that the attractive gravitational potential of the dust is spherically symmetric and proportional 
to the square of the distance from the star,*? calculate the slow precession it gives to a basically circular 
orbit of the planet. 

3.16. A particle is moving in the field of an attractive central force, with potential 


u(r)=-4, where an>0. 
r 


For what values of n, the circular orbits are stable? 


3.17. Determine the condition for a particle of mass m, moving under the effect of a central 


attractive force 
r r 
F =—-a—exp,-—,;, 
re P| a 


where a and R are positive constants, to have a stable circular orbit. 


3.18. A particle of mass m, with angular momentum L,, moves in the field of an attractive central 
force with a distance-independent magnitude F. If the particle's energy E£ is slightly higher than the 
value Exnin Corresponding to the circular orbit of the particle, what is the time period of its radial 
oscillations? Compare the period with that of the circular orbit at E = Exnin. 


3.19. A particle may move without friction, in the uniform gravity field g = —gn,, over an 
axially-symmetric surface that is described, in cylindrical coordinates {p, g, z}, by a smooth function 


3! This fact, first proved in 1710 by Jacob Hermann, was repeatedly rediscovered during the next two centuries. 
As a result, the most common name of A is, rather unfairly, the Runge-Lenz vector. 

32 As may be readily shown from the gravitation version of the Gauss law (see, e.g., the model solution of 
Problem 1.7), this approximation is exact if the dust density is constant between the star and the planet. 
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Z(p) — see the figure on the right. Derive the condition of stability of circular 
orbits of the particle around the symmetry axis z with respect to small 
perturbations. For the cases when the condition is fulfilled, find out whether 
the weakly perturbed orbits are open or closed. Spell out your results for the 
following particular cases: 


(1) a conical surface with Z= ap, 
(11) a parabaloid with Z= Kp'/2, and 
(iii) a spherical surface with Z° + p° = R’, for p< R. 


3.20. The gravitational potential (i.e. the gravitational energy of a unit probe mass) of our Milky 
Way galaxy, averaged over interstellar distances, is reasonably well approximated by the following 


axially-symmetric function: 
2 


d(r,z) = Hal? + az’) 


where r is the distance from the galaxy’s symmetry axis and z is the distance from its central plane, 
while V and a> 0 are constants.*3 Prove that circular orbits of stars in this gravity field are stable, and 
calculate the frequencies of their small oscillations near such orbits, in the r- and z-directions. 


.21. For particle scattering by a repulsive Coulomb field, calculate the minimum approach 
distance 7min and the velocity Vipin at that point, and analyze their dependence on the impact parameter b 
(see Fig. 9) and the initial velocity v.. of the particle. 


3.22. A particle is launched from afar, with impact parameter b, toward an attracting center with 
a 
U(r)=-—, withn>2,a>0. 
r 


(i) Express the minimum distance between the particle and the center via 5, if the initial kinetic 
energy E of the particle is barely sufficient for escaping its capture by the attracting center. 
(ii) Calculate the capture’s total cross-section; explore the limit n > 2. 


3.23. A meteorite with initial velocity v.. approaches an atmosphere-free planet of mass M and 
radius R. 


(i) Find the condition on the impact parameter b for the meteorite to hit the planet’s surface. 
(ii) If the meteorite barely avoids the collision, what is its scattering angle? 


3.24. Calculate the differential and total cross-sections of the classical elastic scattering of small 
particles by a hard sphere of radius R. 


3.25. The most famous*4 confirmation of Einstein’s general relativity theory has come from the 
observation, by A. Eddington and his associates, of star light’s deflection by the Sun, during the May 


33 Just for the reader’s reference, these constants are close to, respectively, 2.2x10° m/s and 6. 
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1919 solar eclipse. Considering light photons as classical particles propagating with the speed of light, 
vo > c ¥ 3.00x10%m/s, and the astronomic data for the Sun’s mass, Ms ~ 1.99x10°°kg, and radius, Rs ~ 
6.96x10°m, calculate the non-relativistic mechanics’ prediction for the angular deflection of the light 
rays grazing the Sun’s surface. 


3.26. Generalize the expression for the small angle of scattering, obtained in the solution of the 
previous problem, to a spherically-symmetric but otherwise arbitrary potential U(r). Use the result to 
calculate the differential cross-section of small-angle scattering by potential U = C/r’, with integer n > 0. 


Hint: You may like to use the following table integral: | een ee = i Tn/2+1/2) 
se) (eee nV\n/ 2) 
eet 1) ( 


34 It was not the first confirmation, though. The first one came four years earlier from Albert Einstein himself, 
who showed that his theory may qualitatively explain the difference between the rate of Mercury orbit’s 
precession, known from earlier observations, and the non-relativistic theory of that effect. 
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Chapter 4. Rigid Body Motion 
This chapter discusses the motion of rigid bodies, with a heavy focus on its most nontrivial part: 


rotation. Some byproducts of this analysis enable a discussion, at the end of the chapter, of the motion 
of point particles as observed from non-inertial reference frames. 


4.1. Translation and rotation 


It is natural to start a discussion of many-particle systems from a (relatively :-) simple limit when 
the changes of distances 74’ = |r —I’| between the particles are negligibly small. Such an abstraction is 
called the (absolutely) rigid body; it is a reasonable approximation in many practical problems, 
including the motion of solid samples. In other words, this model neglects deformations — which will be 
the subject of the next chapters. The rigid-body approximation reduces the number of degrees of 
freedom of the system of N particles from 3N to just six — for example, three Cartesian coordinates of 
one point (say, 0), and three angles of the system’s rotation about three mutually perpendicular axes 
passing through this point — see Fig. 1.! 


“lab” 
frame 
Fig. 4.1. Deriving Eq. (8). 


As it follows from the discussion in Secs. 1.1-1.3, any purely translational motion of a rigid 
body, at which the velocity vectors v of all points are equal, is not more complex than that of a point 
particle. Indeed, according to Eqs. (1.8) and (1.30), in an inertial reference frame, such a body moves 
exactly as a point particle upon the effect of the net external force F“®°. However, the rotation is a bit 
more tricky. 


Let us start by showing that an arbitrary elementary displacement of a rigid body may be always 
considered as a sum of the translational motion and of what is called a pure rotation. For that, consider a 
“moving” reference frame {nj, nz, n3}, firmly bound to the body, and an arbitrary vector A (Fig. 1). The 
vector may be represented by its Cartesian components A; in that moving frame: 


3 
A= 2.4m, (4.1) 
: 


! An alternative way to arrive at the same number six is to consider three points of the body, which uniquely 
define its position. If movable independently, the points would have nine degrees of freedom, but since three 
distances rj,’ between them are now fixed, the resulting three constraints reduce the number of degrees of freedom 
to six. 
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Let us calculate the time derivative of this vector as observed from a different (“lab”) frame, 
taking into account that if the body rotates relative to this frame, the directions of the unit vectors nj, as 
seen from the lab frame, change in time. Hence, in each product contributing to the sum (1), we have to 
differentiate both operands: 

3 3 
ka eae? “i n,+>4, am (4.2) 
dt dt dt 


jal jal 


On the right-hand side of this equality, the first sum obviously describes the change of vector A as 
observed from the moving frame. In the second sum, each of the infinitesimal vectors dn; may be 
represented by its Cartesian components: 


3 
dn, =) dojm,, (4.3) 
j=l 


where dg; are some dimensionless scalar coefficients. To find out more about them, let us scalar- 
multiply each side of Eq. (3) by an arbitrary unit vector nj», and take into account the obvious 
orthonormality condition: 

Nt. =n; (4.4) 


where 6; is the Kronecker delta symbol.” As a result, we get 
dn , 0» = dQ jv. (4.5) 
Now let us use Eq. (5) to calculate the first differential of Eq. (4): 


AD jp Dj FM AD jn = AQ jn + AQ jn =O; in particular, 2dn,-n,=2dp, =0. (4.6) 


These relations, valid for any choice of indices 7, /’, andj” of the set {1, 2, 3}, show that the 
matrix with elements dq; is antisymmetric with respect to the swap of its indices; this means that there 
are not nine just three non-zero independent coefficients dg’, all with 7 4 j’. Hence it is natural to 
renumber them in a simpler way: dy; = —dg;; = dg, where the indices /, j’, and 7” follow in the 
“correct” order — either {1,2,3}, or {2,3,1}, or {3,1,2}. It is straightforward to verify (either just by a 
component-by-component comparison or by using the Levi-Civita permutation symbol?) that in this new 
notation, Eq. (3) may be represented just as a vector product: 


47 


where d@ is the infinitesimal vector defined by its Cartesian components dq in the rotating reference 
frame {n), No, ns}. 


This relation is the basis of all rotation kinematics. Using it, Eq. (2) may be rewritten as 


J ; = in mov +@x A, where (60) aoe (4.8) 
a t dt 


iA 3 
= No 


dt inlab ~~ dt 


Jal 


To reveal the physical sense of the vector w, let us apply Eq. (8) to the particular case when A is the 
radius vector r of a point of the body, and the lab frame is selected in a special way: its origin has the 


2 See, e.g., MA Eq. (13.1). 
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same position and moves with the same velocity as that of the moving frame, in the particular instant 
under consideration. In this case, the first term on the right-hand side of Eq. (8) is zero, and we get 
dr 


d t in special lab frame = 


xr, (4.9) 


were vector r itself is the same in both frames. According to the vector product definition, the particle 
velocity described by this formula has a direction perpendicular to the vectors w and r (Fig. 2), and 
magnitude arsin@. As Fig. 2 shows, the last expression may be rewritten as wp, where p = rsin@ is the 
distance from the line that is parallel to the vector and passes through point 0. This is of course just 
the pure rotation about that line (called the instantaneous axis of rotation), with the angular velocity o. 
According to Eqs. (3) and (8), the angular velocity vector @ is defined by the time evolution of the 
moving frame alone, so it is the same for all points r, i.e. for the rigid body as a whole. Note that nothing 
in our calculations forbids not only the magnitude but also the direction of the vector w, and thus of the 
instantaneous axis of rotation, to change in time; hence the name. 


r 
Fig. 4.2. The instantaneous axis and 
0 the angular velocity of rotation. 


Now let us generalize our result a step further, considering two reference frames that do not 
rotate versus each other: one (“lab”) frame arbitrary, and another one selected in the special way 
described above, so that for it Eq. (9) is valid in it. Since their relative motion of these two reference 
frames is purely translational, we can use the simple velocity addition rule given by Eq. (1.6) to write 


Viton — Vol ina 7 


= Vo] int» + OXF, (4.10) 


in special lab frame 


where r is the radius vector of a point is measured in the body-bound (“moving”) frame 0. 


4.2. Inertia tensor 


Since the dynamics of each point of a rigid body is strongly constrained by the conditions riz: = 
const, this is one of the most important fields of application of the Lagrangian formalism discussed in 
Chapter 2. For using this approach, the first thing we need to calculate is the kinetic energy of the body 
in an inertial reference frame. Since it is just the sum of the kinetic energies (1.19) of all its points, we 
can use Eq. (10) to write:4 


T “yaw = ear +oxr) =>" +> mv, (oxr)+ YF (oxny. (4.11) 


4 Actually, all symbols for particle masses, coordinates, and velocities should carry the particle’s index, over 
which the summation is carried out. However, in this section, for the notation simplicity, this index is just implied. 
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Let us apply to the right-hand side of Eq. (11) two general vector analysis formulas listed in the Math 
Appendix: the so-called operand rotation rule MA Eq. (7.6) to the second term, and MA Eq. (7.7b) to 
the third term. The result is 


T= Fv + Dim (v)x@)+ DS lo*r? —(@-)’] (4.12) 


This expression may be further simplified by making a specific choice of the point 0 (from which the 
radius vectors r of all particles are measured), namely by using for this point the center of mass of the 
body. As was already mentioned in Sec. 3.4 for the two-point case, the radius vector R of this point is 
defined as 

MR=) mr, withM =)'m, (4.13) 


so that MW is the total mass of the body. In the reference frame centered at this point, we have R = 0, so 
that the second sum in Eq. (12) vanishes, and the kinetic energy is a sum of just two terms: 


T aaa Te, = YF lo’ -@-n)'], (4.14) 


tran 


T Tse aa ee 
where V = dR/dt is the center-of-mass velocity in our inertial reference frame, and all particle positions 
r are measured in the center-of-mass frame. Since the angular velocity vector @ is common for all points 
of a rigid body, it is more convenient to rewrite the rotational part of the energy in a form in that the 
summation over the components of this vector is separated from the summation over the points of the 
body: 
(4.15) Kinetic 


energy of 
rotation 


where the 3x3 matrix with elements 


Inertia 


(4. 16) tensor 


represents, in the selected reference frame, the inertia tensor of the body.> 


Actually, the term “tensor” for the construct described by this matrix has to be justified, because 
in physics it implies a certain reference-frame-independent notion, whose matrix elements have to obey 
certain rules at the transfer between reference frames. To show that the matrix (16) indeed described 
such notion, let us calculate another key quantity, the total angular momentum L of the same body.°® 
Summing up the angular momenta of each particle, defined by Eq. (1.31), and then using Eq. (10) again, 
in our inertial reference frame we get 


L=)orxp=)> mrxv=) mrx(v, +@xr)= > mrxv, +>) mr x (@ xr). (4.17) 


We see that the momentum may be represented as a sum of two terms. The first one, 


5 While the ABCs of the rotational dynamics were developed by Leonhard Euler in 1765, an introduction of the 
inertia tensor’s formalism had to wait very long — until the invention of the tensor analysis by Tullio Levi-Civita 
and Gregorio Ricci-Curbastro in 1900 — soon popularized by its use in Einstein’s general relativity. 

6 Hopefully, there is very little chance of confusing the angular momentum L (a vector) and its Cartesian 
components L; (scalars with an index) on one hand, and the Lagrangian function L (a scalar without an index) on 
the other hand. 
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L, = ))mrxv, =MRxvo, (4.18) 


describes the possible rotation of the center of mass about the inertial frame’s origin. This term vanishes 
if the moving reference frame’s origin 0 is positioned at the center of mass (where R = 0). In this case, 
we are left with only the second term, which describes a pure rotation of the body about its center of 
mass: 


L=L,, = )'mrx(xr). (4.19) 
Using one more vector algebra formula, the “bac minis cab” rule,” we may rewrite this expression as 
L=)\m lor? —r(r -@)| (4.20) 


Let us spell out an arbitrary Cartesian component of this vector: 


L, -Dnfor “nine, = Ym o,(7°6, = rrp). (4.21) 
j=l j=l 


By changing the summation order and comparing the result with Eq. (16), the angular momentum may 
be conveniently expressed via the same matrix elements J; as the rotational kinetic energy: 


Angular 
momentum 


(4.22) 


Since L and @ are both legitimate vectors (meaning that they describe physical vectors 
independent of the reference frame choice), the matrix of elements J; that relates them is a legitimate 
tensor. This fact, and the symmetry of the tensor (J; = J;,;), evident from its definition (16), allow the 
tensor to be further simplified. In particular, mathematics tells us that by a certain choice of the 
coordinate axes’ orientations, any symmetric tensor may be reduced to a diagonal form 


(4.23) 


T, =10;, 


jv ij’? 


where in our case 
Principal 2 2 ( 2 2\_ 2 
cicunvonts at qT, = G =i = yim Pip +0 pn =) mo; ; (4.24) 


inertia 


p; being the distance of the particle from the j'" axis, i.e. the length of the perpendicular dropped from 
the point to that axis. The axes of such a special coordinate system are called the principal axes, while 
the diagonal elements J; given by Eq. (24), the principal moments of inertia of the body. In such a 
special reference frame, Eqs. (15) and (22) are reduced to very simple forms: 


Trot and Lin (4.25) 
principal-axes 
frame 
(4.26) 


Both these results remind the corresponding relations for the translational motion, Ttran = V7/2 and P = 
MV, with the angular velocity @ replacing the linear velocity V, and the tensor of inertia playing the role 
of scalar mass M. However, let me emphasize that even in the specially selected reference frame, with 


7 See, e.g., MA Eq. (7.5). 
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its axes pointing in principal directions, the analogy is incomplete, and rotation is generally more 
complex than translation, because the measures of inertia, J;, are generally different for each principal 
axis. 


Let me illustrate the last fact on a simple but instructive system of three similar massive particles 
fixed in the vertices of an equilateral triangle (Fig. 3). 


> 
“ Fig. 4.3. Principal moments of 
inertia: a simple case study. 


m 


Due to the symmetry of the configuration, one of the principal axes has to pass through the center of 
mass 0 and be normal to the plane of the triangle. For the corresponding principal moment of inertia, Eq. 
(24) readily yields J; = 3mp’. If we want to express this result in terms of the triangle’s side a, we may 
notice that due to the system’s symmetry, the angle marked in Fig. 3 equals 7/6, and from the shaded 
right triangle, a/2 = pcos(/6) = pvV3/2, giving p = a/V3, so that, finally, 3 = ma’. 


Let me use this simple case to illustrate the following general axis shift theorem, which may be 
rather useful — especially for more complex systems. For that, let us relate the inertia tensor elements J;;: 
and J’, calculated in two reference frames — one with its origin at the center of mass 0, and another one 
(0’) translated by a certain vector d (Fig. 4a), so that for an arbitrary point, r’ = r + d. Plugging this 
relation into Eq. (16), we get 


I’ » =Yiml(r+ays, -(r, +d,)(r, +d,) 


7 : | ; : (4.27) 
= Ym +2r-d+d 6 (7, td tryd, +d,d,)h 


i’ 
Since in the center-of-mass frame, all sums dr; equal zero, we may use Eq. (16) to finally obtain 


Dp =1 y+ M(6,d* —d,d,). (4.28) 


In particular, this equation shows that if the shift vector d is perpendicular to one (say, j") of the 
principal axes (Fig. 4b), i.e. d; = 0, then Eq. (28) is reduced to a very simple formula: 


Principal 
ei, + Md’. (4.29) cae 
shift 


Fig. 4.4. (a) A general coordinate 
frame’s shift from the center of 
mass, and (b) a shift perpendicular 
to one of the principal axes. 
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Now returning to the particular system shown in Fig. 3, let us perform such a shift to the new 
(“primed”) axis passing through the location of one of the particles, still perpendicular to their common 
plane. Then the contribution of that particular mass to the primed moment of inertia vanishes, and /’3 = 
2ma*. Now, returning to the center of mass and applying Eq. (29), we get J; = 1’3 — Mp’ = 2ma’ — 
(3m)(a/V3)" = ma’, i.e. the same result as above. 


The symmetry situation inside the triangle’s plane is somewhat less obvious, so let us start by 
calculating the moments of inertia for the axes shown vertical and horizontal in Fig. 3. From Eq. (24), 
we readily get: 


2 2 2 2 2 
I, =2mh? + mp? =m| 2} | +|-=] |=, 1, =2m/ =| =", 430 
ani onotef55) (3) EOE, nea —2E. ae 


where / is the distance from the center of mass and any side of the triangle: h = psin(z/6) = p/2 = 
a/2\3. We see that I; = i, and mathematics tells us that in this case, any in-plane axis (passing through 
the center-of-mass 0) may be considered as principal, and has the same moment of inertia. A rigid body 
with this property, J) = 4 4;, is called the symmetric top. (The last direction is called the main principal 
axis of the system.) 


Despite the symmetric top’s name, the situation may be even more symmetric in the so-called 
spherical tops, 1.e. highly symmetric systems whose principal moments of inertia are all equal, 


I,=1,=1,=1!, (4.31) 


Mathematics says that in this case, the moment of inertia for rotation about any axis (but still passing 
through the center of mass) is equal to the same 7. Hence Eqs. (25) and (26) are further simplified for 
any direction of the vector @: 

vs 

‘er =p @ ’ 

2 
thus making the analogy of rotation and translation complete. (As will be discussed in the next section, 
this analogy is also complete if the rotation axis is fixed by external constraints.) 


L=lo, (4.32) 


Evident examples of a spherical top are a uniform sphere and a uniform spherical shell; its less 
obvious example is a uniform cube — with masses either concentrated in vertices, or uniformly spread 
over the faces, or uniformly distributed over the volume. Again, in this case any axis passing through the 
center of mass is a principal one and has the same principal moment of inertia. For a sphere, this is 
natural; for a cube, rather surprising — but may be confirmed by a direct calculation. 


4.3. Fixed-axis rotation 


Now we are well equipped for a discussion of the rigid body’s rotational dynamics. The general 
equation of this dynamics is given by Eq. (1.38), which is valid for dynamics of any system of particles 
— either rigidly connected or not: 

L=t, (4.33) 


where Tt is the net torque of external forces. Let us start exploring this equation from the simplest case 
when the axis of rotation, i.e. the direction of vector @, is fixed by some external constraints. Directing 
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the z-axis along this vector, we have @, = @, = 0. According to Eq. (22), in this case, the z-component of 
the angular momentum, 
L,=1,,0,, (4.34) 


where J,,, though not necessarily one of the principal moments of inertia. still may be calculated using 
Eq. (24): 


I= Yimp: = Yi m(x? +37), (4.35) 


with p, being the distance of each particle from the rotation axis z. According to Eq. (15), in this case the 
rotational kinetic energy is just 


I 
8 Ay Oe (4.36) 


Moreover, it is straightforward to show that if the rotation axis is fixed, Eqs. (34)-(36) are valid even if 
the axis does not pass through the center of mass — provided that the distances p, are now measured 
from that axis. (The proof is left for the reader’s exercise.) 


As a result, we may not care about other components of the vector L,* and use just one 
component of Eq. (33), 
Lb ex (4.37) 


because it, when combined with Eq. (34), completely determines the dynamics of rotation: 
L@,=t; 10263 (4.38) 


where @, is the angle of rotation about the axis, so that @-, =@. The scalar relations (34), (36), and (38), 
describing rotation about a fixed axis, are completely similar to the corresponding formulas of 1D 
motion of a single particle, with @, corresponding to the usual (“linear”) velocity, the angular 
momentum component L, — to the linear momentum, and /, —to the particle’s mass. 


The resulting motion about the axis is also frequently similar to that of a single particle. As a 
simple example, let us consider what is called the physical (or “compound”’) pendulum (Fig. 5) — a rigid 
body free to rotate about a fixed horizontal axis that does not pass through the center of mass 0, in a 
uniform gravity field g. 


Fig. 4.5. Physical pendulum: a rigid 
body with a fixed (horizontal) rotation 
axis 0’ that does not pass through the 
center of mass 0. (The plane of 
drawing is normal to that axis.) 


8 Note that according to Eq. (22), other Cartesian components of the angular momentum, L, and L,, may be 
different from zero, and even evolve in time. The corresponding torques 7, and z,, which obey Eq. (33), are 
automatically provided by the external forces that keep the rotation axis fixed. 
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Let us drop the perpendicular from point 0 to the rotation axis, and call the oppositely directed 
vector I — see the dashed arrow in Fig. 5. Then the torque (relative to the rotation axis 0’) of the forces 
keeping the axis fixed is zero, and the only contribution to the net torque is due to gravity alone: 


tl 6 — ee xF= x +r] 9 me = > m(Ixg)+ Darlin 0 xg=MIxg. (4.39) 
(The last step used the facts that point 0 is the center of mass, so that the second term on the right-hand 


side equals zero, and that the vectors I and g are the same for all particles of the body.) 


This result shows that the torque is directed along the rotation axis, and its (only) component z, 
is equal to —Mg/sin@, where @ is the angle between the vectors I and g, i.e. the angular deviation of the 
pendulum from the position of equilibrium — see Fig. 5 again. As a result, Eq. (38) takes the form, 


'6 =—Mgl sin 8, (4.40) 


where J’ is the moment of inertia for rotation about the axis 0’ rather than about the center of mass. This 
equation is identical to Eq. (1.18) for the point-mass (sometimes called “mathematical”) pendulum, with 
small-oscillation frequency 


(4.41) 


As a sanity check, in the simplest case when the linear size of the body is much smaller than the 
suspension length /, Eq. (35) yields ’ = MI’, i.e. lep= 1, and Eq. (41) reduces to the well-familiar formula 
Q = (g/1)'" for the point-mass pendulum. 


Now let us discuss the situations when a rigid body not only rotates but also moves as a whole. 
As was mentioned in the introductory chapter, the total linear momentum of the body, 


Ps Dmv= Smt = 2S mr, (4.42) 


satisfies the 2’ Newton’s law in the form (1.30). Using the definition (13) of the center of mass, the 
momentum may be represented as 
P = MR=MV, (4.43) 
so Eq. (1.30) may be rewritten as 
MV =F, (4.44) 


where F is the vector sum of all external forces. This equation shows that the center of mass of the body 
moves exactly like a point particle of mass M, under the effect of the net force F. In many cases, this 
fact makes the translational dynamics of a rigid body absolutely similar to that of a point particle. 


The situation becomes more complex if some of the forces contributing to the vector sum F 
depend on the rotation of the same body, i.e. if its rotational and translational motions are coupled. 
Analysis of such coupled motion is rather straightforward if the direction of the rotation axis does not 
change in time, and hence Eqs. (34)-(36) are still valid. Possibly the simplest example is a round 
cylinder (say, a wheel) rolling on a surface without slippage (Fig. 6). Here the no-slippage condition 
may be represented as the requirement to the net velocity of the particular wheel’s point A that touches 
the surface to equal zero — in the reference frame bound to the surface. For the simplest case of plane 
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surface (Fig. 6a), this condition may be spelled out using Eq. (10), giving the following relation between 
the angular velocity @ of the wheel and the linear velocity V of its center: 


V +ro=0. (4.45) 


(a) (b) 


Fig. 4.6. Round cylinder 
rolling over (a) a plane 
surface and (b) a concave 
surface. 


Such kinematic relations are essentially holonomic constraints, which reduce the number of 
degrees of freedom of the system. For example, without the no-slippage condition (45), the wheel on a 
plane surface has to be considered as a system with two degrees of freedom, making its total kinetic 
energy (14) a function of two independent generalized velocities, say V and a: 


T=T,,, +T ee (4.46) 


tran rot 
2 2 


Using Eq. (45) we may eliminate, for example, the linear velocity and reduce Eq. (46) to 


M rs I 

T =—(or) +-o0? =o’, — where I, =1+Mr’. (4.47) 
2 2 2 

This result may be interpreted as the kinetic energy of pure rotation of the wheel about the instantaneous 

rotation axis A, with /-¢ being the moment of inertia about that axis, satisfying Eq. (29). 


Kinematic relations are not always as simple as Eq. (45). For example, if a wheel is rolling on a 
concave surface (Fig. 6b), we need to relate the angular velocities of the wheel’s rotation about its axis 
0’ (say, @) and that (say, Q) of its axis’ rotation about the center 0 of curvature of the surface. A popular 
error here is to write Q = -(7/R)@ [WRONG!]. A prudent way to derive the correct relation is to note 
that Eq. (45) holds for this situation as well, and on the other hand, the same linear velocity of the 
wheel’s center may be expressed as V = (R — r) QO. Combining these formulas, we get the correct relation 


r 
R-r 


Q=- Q. (4.48) 


Another famous example of the relation between translational and rotational motion is given by 
the “sliding-ladder” problem (Fig. 7). Let us analyze it for the simplest case of negligible friction, and 
the ladder’s thickness being small in comparison with its length /. 


Fig. 4.7. The sliding-ladder problem. 
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To use the Lagrangian formalism, we may write the kinetic energy of the ladder as the sum (14) 
of its translational and rotational parts: 


r=“ (7+ ¥)+4a?, (4.49) 
2 2 
where_X and Y are the Cartesian coordinates of its center of mass in an inertial reference frame, and J is 
the moment of inertia for rotation about the z-axis passing through the center of mass. (For the 
uniformly distributed mass, an elementary integration of Eq. (35) yields J = M/’/12). In the reference 
frame with the center in the corner 0, both XY and Y may be simply expressed via the angle @ : 


fa Beas nes (4.50) 
2 2 


(The easiest way to obtain these relations is to notice that the dashed line in Fig. 7 has length //2, and the 
same slope @ as the ladder.) Plugging these expressions into Eq. (49), we get 


ee, (aN eee 
T=—a 5 Ts =/+M > =a . (4.51) 


2 


Since the potential energy of the ladder in the gravity field may be also expressed via the same angle, 
U = MeY = Mgsin a, (4.52) 


@ may be conveniently used as the (only) generalized coordinate of the system. Even without writing 
the Lagrange equation of motion for that coordinate, we may notice that since the Lagrangian function L 
= T— U does not depend on time explicitly, and the kinetic energy (51) is a quadratic-homogeneous 
function of the generalized velocity @ , the full mechanical energy, 


Teed ; Mgl(la? 
E=T+U= a je ah cs ME ae : (4.53) 
2 2 2 \ 3g 
is conserved, giving us the first integral of motion. Moreover, Eq. (53) shows that the system’s energy 
(and hence dynamics) is identical to that of a physical pendulum with an unstable fixed point a = 7/2, a 
stable fixed point at a =-—7/2, and frequency 


7 3g 1/2 
o-(%2) (4.54) 


of small oscillations near the latter point. (Of course, this fixed point cannot be reached in the simple 
geometry shown in Fig. 7, where the ladder’s fall on the floor would change its equations of motion. 
Moreover, even before that, the left end of the ladder may detach from the wall. The analysis of this 
issue is left for the reader’s exercise.) 


4.4. Free rotation 


Now let us proceed to more complex situations when the rotation axis is not fixed. A good 
illustration of the complexity arising in this case comes from the case of a rigid body left alone, i.e. not 
subjected to external forces and hence with its potential energy U constant. Since in this case, according 
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to Eq. (44), the center of mass (as observed from any inertial reference frame) moves with a constant 
velocity, we can always use a convenient inertial reference frame with the origin at that point. From the 
point of view of such a frame, the body’s motion is a pure rotation, and Tyran = 0. Hence, the system’s 
Lagrangian function is just its rotational energy (15), which is, first, a quadratic-homogeneous function 
of the components @ (which may be taken for generalized velocities), and, second, does not depend on 
time explicitly. As we know from Chapter 2, in this case the mechanical energy, here equal to 7; alone, 
is conserved. According to Eq. (15), for the principal-axes components of the vector @, this means 


(4.55) 


Next, as Eq. (33) shows, in the absence of external forces, the angular momentum L of the body is 
conserved as well. However, though we can certainly use Eq. (26) to represent this fact as 


3 
L= Yon, =const , (4.56) 


j=l 


where nj; are the principal axes, this does not mean that all components @ are constant, because the 
principal axes are fixed relative to the rigid body, and hence may rotate with it. 


Before exploring these complications, let us briefly mention two conceptually easy, but 
practically very important cases. The first is a spherical top (1) = 4 = 45 = J). In this case, Eqs. (55) and 
(56) imply that all components of the vector @ = L//, 1.e. both the magnitude and the direction of the 
angular velocity are conserved, for any initial spin. In other words, the body conserves its rotation speed 
and axis direction, as measured in an inertial frame. The most obvious example is a spherical planet. For 
example, our Mother Earth, rotating about its axis with angular velocity @ = 2m/(1 day) ~ 7.3x10° s", 
keeps its axis at a nearly constant angle of 23°27’ to the ecliptic pole, i.e. to the axis normal to the plane 
of its motion around the Sun. (In Sec. 6 below, we will discuss some very slow motions of this axis, due 
to gravity effects.) 


Spherical tops are also used in the most accurate gyroscopes, usually with gas-jet or magnetic 
suspension in vacuum. If done carefully, such systems may have spectacular stability. For example, the 
gyroscope system of the Gravity Probe B satellite experiment, flown in 2004-2005, was based on quartz 
spheres — round with a precision of about 10 nm and covered with superconducting thin films (which 
enabled their magnetic suspension and monitoring). The whole system was stable enough to measure the 
so-called geodetic effect in general relativity (essentially, the space curving by the Earth’s mass), 
resulting in the axis’ precession by only 6.6 arc seconds per year, i.e. with an angular velocity of just 
~10°''s", with experimental results agreeing with theory with a record ~0.3% accuracy. 


The second simple case is that of the symmetric top (1 = 4, # J3) with the initial vector L aligned 
with the main principal axis. In this case, @ = L/J; = const, so that the rotation axis is conserved.!° Such 
tops, typically in the shape of a flywheel (heavy, flat rotor), and supported by gimbal systems (also 
called the “Cardan suspensions”) that allow for virtually torque-free rotation about three mutually 


9 Still, the main goal of this rather expensive (~$750M) project, an accurate measurement of a more subtle 
relativistic effect, the so-called frame-dragging drift (also called “the Schiff precession”), predicted to be about 
0.04 arc seconds per year, has not been achieved. 

10 This is also true for an asymmetric top, i.e. an arbitrary body (with, say, J; < lL < 4), but in this case the 
alignment of the vector L with the axis nj corresponding to the intermediate moment of inertia, is unstable. 
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perpendicular axes,!'! are broadly used in more common gyroscopes. Invented by Léon Foucault in the 
1850s and made practical later by H. Anschiitz-Kaempfe, such gyroscopes have become core parts of 
automatic guidance systems, for example, in ships, airplanes, missiles, etc. Even if its support wobbles 
and/or drifts, the suspended gyroscope sustains its direction relative to an inertial reference frame. !? 


However, in the general case with no such special initial alignment, the dynamics of symmetric 
tops is more complicated. In this case, the vector L is still conserved, including its direction, but the 
vector @ is not. Indeed, let us direct the nz-axis normally to the common plane of the vector L and the 
current instantaneous direction n3 of the main principal axis (in Fig. 8 below, the plane of the drawing); 
then, in that particular instant, 22 = 0. Now let us recall that in a symmetric top, the axis no is a principal 
one. According to Eq. (26) with 7 = 2, the corresponding component @ has to be equal to La/h, so it is 
equal to zero. This means that in the particular instant we are considering, the vector @ lies in this plane 
(the common plane of vectors L and n3) as well — see Fig. 8a. 


Fig. 4.8. Free rotation of a symmetric top: 
(a) the general configuration of vectors, 
and (b) calculating the free precession 
frequency. 


Now consider some point located on the main principal axis n3, and hence on the plane [ns, L]. 
Since @ is the instantaneous axis of rotation, according to Eq. (9), the instantaneous velocity v = wxr of 
the point is directed normally to that plane. This is true for each point of the main axis (besides only one, 
with r = 0, i.e. the center of mass, which does not move), so the axis as a whole has to move normally to 
the common plane of the vectors L, @, and n3, while still passing through point 0. Since this conclusion 
is valid for any moment of time, it means that the vectors @ and n3 rotate about the space-fixed vector L 
together, with some angular velocity @pre, at each moment staying within one plane. This effect is called 
the free (or “torque-free”, or “regular’) precession, and has to be clearly distinguished it from the 
completely different effect of the torque-induced precession, which will be discussed in the next section. 


To calculate @pre, let us represent the instant vector @ as a sum of not its Cartesian components 
(as in Fig. 8a), but rather of two non-orthogonal vectors directed along n3 and L (Fig. 8b): 


L 
© = Q,,N; + ON, n, 7 (4.57) 


rot pre 


11 See, for example, a nice animation available online at http://en.wikipedia.org/wiki/Gimbal. 

12 Currently, optical gyroscopes are becoming more popular for all but the most precise applications. Much more 
compact but also much less accurate gyroscopes used, for example, in smartphones and tablet computers, are 
based on the effect of rotation on 2D mechanical oscillators (whose analysis is left for the reader’s exercise), and 
are implemented as micro-electro-mechanical systems (MEMS) — see, e.g., Chapter 22 in V. Kaajakari, Practical 
MEMS, Small Gear Publishing, 2009. 
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Fig. 8b shows that @o: has the meaning of the angular velocity of rotation of the body about its main 
principal axis, while @pre is the angular velocity of rotation of that axis about the constant direction of 
the vector L, 1.e. is exactly the frequency of precession that we are trying to find. Now @pre may be 
readily calculated from the comparison of two panels of Fig. 8, by noticing that the same angle 0 
between the vectors L and n; participates in two relations: 


Q, 


sin 0 = 7 = (4.58) 


ay) 


pre 


Since the nj-axis is a principal one, we may use Eq. (26) for j = 1, 1.e. ZL; = 1m, to eliminate @, from 
Eq. (58), and get a very simple formula 


(4.59) 


This result shows that the precession frequency is constant and independent of the alignment of the 
vector L with the main principal axis n3, while its amplitude (characterized by the angle 0) does depend 
on the initial alignment, and vanishes if L is parallel to n3.!3 Note also that if all principal moments of 
inertia are of the same order, @r. 1s of the same order as the total angular speed @ = | w | of the rotation. 


Now let us briefly discuss the free precession in the general case of an “asymmetric top”, i.e. a 
body with arbitrary J; # J) # 13. In this case, the effect is more complex because here not only the 
direction but also the magnitude of the instantaneous angular velocity @ may evolve in time. If we are 
only interested in the relation between the instantaneous values of @; and L;, 1.e. the “trajectories” of the 
vectors @ and L as observed from the reference frame {nj, no, n3} of the principal axes of the body, 
rather than in the explicit law of their time evolution, they may be found directly from the conservation 
laws. (Let me emphasize again that the vector L, being constant in an inertial reference frame, generally 
evolves in the frame rotating with the body.) Indeed, Eq. (55) may be understood as the equation of an 
ellipsoid in the Cartesian coordinates {@, @, @;}, so that for a free body, the vector @ has to stay on 
the surface of that ellipsoid.'* On the other hand, since the reference frame’s rotation preserves the 
length of any vector, the magnitude (but not the direction!) of the vector L is also an integral of motion 
in the moving frame, and we can write 


3 3 
Lt = YL) = D1; @; = const. (4.60) 


Hence the trajectory of the vector @ follows the closed curve formed by the intersection of two 
ellipsoids, (55) and (60) — the so-called Poinsot construction. It is evident that this trajectory is generally 
“taco-edge-shaped”, i.e. more complex than a planar circle, but never very complex either.!> 


The same argument may be repeated for the vector L, for whom the first form of Eq. (60) 
descries a sphere, and Eq. (55), another ellipsoid: 


13 For our Earth, free precession’s amplitude is so small (corresponding to sub-10-m linear deviations of the 
symmetry axis from the vector L at the surface) that this effect is of the same order as other, more irregular 
motions of the axis, resulting from turbulent fluid flow effects in the planet’s interior and its atmosphere. 

14 Tt is frequently called the Poinsot’s ellipsoid, named after Louis Poinsot (1777-1859) who has made several 
important contributions to rigid body mechanics. 

'5 Curiously, the “wobbling” motion along such trajectories was observed not only for macroscopic rigid bodies 
but also for heavy atomic nuclei — see, e.g., N. Sensharma et al., Phys. Rev. Lett. 124, 052501 (2020). 
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3 
1 
T,. = > ——L, =const. (4.61) 
On the other hand, if we are interested in the trajectory of the vector @ as observed from an 
inertial frame (in which the vector L stays still), we may note that the general relation (15) for the same 


rotational energy 7;.: may also be rewritten as 


1 3 3 
To ==>, YL pO; (4.62) 
274 fa 
But according to the Eq. (22), the second sum on the right-hand side is nothing more than L;, so that 
3 
ie oye ie: (4.63) 
2 2 


This equation shows that for a free body (T7;or = const, L = const), even if the vector m changes in time, 
its endpoint should stay on a plane normal to the angular momentum L. Earlier, we have seen that for 
the particular case of the symmetric top — see Fig. 8b, but for an asymmetric top, the trajectory of the 
endpoint may not be circular. 


If we are interested not only in the trajectory of the vector @ but also in the law of its evolution 
in time, it may be calculated using the general Eq. (33) expressed in the principal components @. For 
that, we have to recall that Eq. (33) is only valid in an inertial reference frame, while the frame {nj, no, 
n3} may rotate with the body and hence is generally not inertial. We may handle this problem by 
applying, to the vector L, the general kinematic relation (8): 


Oss = |anor +OXL (4.64) 
dt dt 
Combining it with Eq. (33), in the moving frame we get 
Cae eee (4.65) 


dt 


where t is the external torque. In particular, for the principal-axis components L;, related to the 
components @; by Eq. (26), the vector equation (65) is reduced to a set of three scalar Euler equations 


(4.66) 


where the set of indices { 7,7’, 7” } has to follow the usual “right” order — e.g., {1, 2, 3}, etc.!° 

In order to get a feeling how do the Euler equations work, let us return to the particular case of a 
free symmetric top (1% = m = B =0, i = #4). In this case, [; — J = 0, so that Eq. (66) with j = 3 yields 
@3 = const, while the equations for 7 = 1 andj = 2 take the following simple form: 


@,=-Q.,,0,, @,=2..0, (4.67) 


pre pre 


where Or. is a constant determined by both the system parameters and the initial conditions: 


16 These equations are of course valid in the simplest case of the fixed rotation axis as well. For example, if @ = 
N.Q, i.e. @ = @, = 0, Eq. (66) is reduced to Eq. (38). 
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(4.68) 


The system of two equations (67) has a sinusoidal solution with frequency Opre, and describes a 
uniform rotation of the vector m, with that frequency, about the main axis n3. This is just another 
representation of the free precession analyzed above, but this time as observed from the rotating body. 
Evidently, Qyre is substantially different from the frequency @pre (59) of the precession as observed from 
the lab frame; for example, Qpre vanishes for the spherical top (with J; = 4 = 3), while @pre, in this case, 
is equal to the rotation frequency. !7 


Unfortunately, for the rotation of an asymmetric top (i.e., an arbitrary rigid body) the Euler 
equations (66) are substantially nonlinear even in the absence of external torque, and may be solved 
analytically only in just a few cases. One of them is a proof of the already mentioned fact: the free top’s 
rotation about one of its principal axes is stable if the corresponding principal moment of inertia is either 
the largest or the smallest one of the three. (This proof is easy, and is left for the reader’s exercise.) 


4.5. Torque-induced precession 


The dynamics of rotation becomes even more complex in the presence of external forces. Let us 
consider the most counter-intuitive effect of torque-induced precession, for the simplest case of an 
axially-symmetric body (which is a particular case of the symmetric top, /; = J) # J3), supported at some 
point A of its symmetry axis, that does not coincide with the center of mass 0 — see Fig. 9. 


(b) 


Fig. 4.9. Symmetric top in the gravity field: 
(a) a side view at the system and (b) the top 
view at the evolution of the horizontal 
component of the angular momentum vector. 


The uniform gravity field g creates bulk-distributed forces that, as we know from the analysis of 
the physical pendulum in Sec. 3, are equivalent to a single force Mg applied in the center of mass — in 
Fig. 9, point 0. The torque of this force relative to the support point A is 


T=Tolin a * Mg = Min, xg. (4.69) 


Hence the general equation (33) of the angular momentum evolution (valid in any inertial frame, for 
example the one with its origin at point A) becomes 


'7 For our Earth with its equatorial bulge (see Sec. 6 below), the ratio (3 — 11)/J, is ~1/300, so that 2 7/Opre is about 
10 months. However, due to the fluid flow effects mentioned above, the observed precession is not very regular. 
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L=Mn, xg. (4.70) 


Despite the apparent simplicity of this (exact!) equation, its analysis is straightforward only in the limit 
when the top is spinning about its symmetry axis n3 with a very high angular velocity @ot. In this case, 
we may neglect the contribution to L due to a relatively small precession velocity @pre (still to be 
calculated), and use Eq. (26) to write 


L=/,@ =/,0,,,N3- (4.71) 


Then Eq. (70) shows that the vector Lis perpendicular to both n3 (and hence L) and g, i.e. lies within a 
horizontal plane and is perpendicular to the horizontal component L,, of the vector L — see Fig. 9b. 
Since, according to Eq. (70), the magnitude of this vector is constant, | L| = Mg/ sin@, the vector L (and 
hence the body’s main axis) rotates about the vertical axis with the following angular velocity: 


_|L _ Mglsin@ _ Mgl _ Mel 


; (4.72) 
L Lsin@ Lo 1, Qi 


xy 


Thus, rather counter-intuitively, the fast-rotating top does not follow the external, vertical force 
and, in addition to fast spinning about the symmetry axis n3, performs a revolution, called the torque- 
induced precession, about the vertical axis.!8 Note that, similarly to the free-precession frequency (59), 
the torque-induced precession frequency (72) does not depend on the initial (and sustained) angle @. 
However, the torque-induced precession frequency is inversely (rather than directly) proportional to @or. 
This fact makes the above simple theory valid in many practical cases. Indeed, Eq. (71) is quantitatively 
valid if the contribution of the precession into L is relatively small: [@pre << 1;@,o1, where J is a certain 
effective moment of inertia for the precession — to be calculated below. Using Eq. (72), this condition 


may be rewritten as 
1/2 
Mgll 
O.., >| 2 (4.73) 


2 

3 
According to Eq. (16), for a body of not too extreme proportions, i.e. with all linear dimensions of the 
same length scale J, all inertia moments are of the order of M/’, so that the right-hand side of Eq. (73) is 
of the order of (g//)'”, i.e. comparable with the frequency of small oscillations of the same body as the 
physical pendulum at the absence of its fast rotation. 


To develop a quantitative theory that would be valid beyond such approximate treatment, the 
Euler equations (66) may be used, but are not very convenient. A better approach, suggested by the 
same L. Euler, is to introduce a set of three independent angles between the principal axes {nj, no, n3} 
bound to the rigid body, and the axes {n,, n,, n,} of an inertial reference frame (Fig. 10), and then 
express the basic equation (33) of rotation, via these angles. There are several possible options for the 
definition of such angles; Fig. 10 shows the set of Euler angles, most convenient for analyses of fast 
rotation.!° As one can see, the first Euler angle, @ is the usual polar angle measured from the n,-axis to 
the n3-axis. The second one is the azimuthal angle v, measured from the n,-axis to the so-called /ine of 
nodes formed by the intersection of planes [n,, n,] and [mj, no]. The last Euler angle, y, is measured 


18 A semi-quantitative interpretation of this effect is a very useful exercise, highly recommended to the reader. 
19 Of the several choices more convenient in the absence of fast rotation, the most common is the set of so-called 
Tait-Brian angles (called the yaw, pitch, and roll), which are broadly used for aircraft and maritime navigation. 
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Euler 
bar : F . : F angles 
within the plane [nj, no], from the line of nodes to the n,-axis. For example, in the simple picture of 


slow force-induced precession of a symmetric top, that was discussed above, the angle @is constant, the 
angle y changes rapidly, with the rotation velocity @,1, while the angle g evolves with the precession 
frequency @pre (72). 


n, 


ns 


plane [m, no] 


L_/ plane [n,, ny] 
NY n 


ly 


. Fig. 4.10. Definition of 
the Euler angles. 


Now we can express the principal-axes components of the instantaneous angular velocity vector, 
@}, @, and @3, as measured in the lab reference frame, in terms of the Euler angles. This may be readily 
done by calculating, from Fig. 10, the contributions of the Euler angles’ evolution to the rotation about 
each principal axis, and then adding them up: 


= psin Osiny + Ocosy, 


@ via 


, = psinOcosy — Osin Y, (4.74) Euler 


angles 


0, =pcosOt+y. 


These relations enable the expression of the kinetic energy of rotation (25) and the angular 
momentum components (26) via the generalized coordinates @, gy, and y and their time derivatives (i.e. 
the corresponding generalized velocities), and then using the powerful Lagrangian formalism to derive 
their equations of motion. This is especially simple to do in the case of symmetric tops (with J; = J), 
because plugging Eqs. (74) into Eq. (25) we get an expression, 


Tro -2(6° + @ sin* 0)+— (peoso+ yi) (4.75) 


which does not include explicitly either g or y. (This reflects the fact that for a symmetric top we can 
always select the nj-axis to coincide with the line of nodes, and hence take wy = 0 at the considered 
moment of time. Note that this trick does not mean we can take y = 0, because the nj-axis, as observed 
from an inertial reference frame, moves!) Now we should not forget that at the torque-induced 
precession, the center of mass moves as well (see, e.g., Fig. 9), so that according to Eq. (14), the total 
kinetic energy of the body is the sum of two terms, 


T=Toy+Tgs Tagg = SV? =F 06? +9" sin’ 8), (4.76) 
while its potential energy is just 
U = Mglcos@+ const . (4.77) 
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Now we could readily use Eqs. (2.19) to write the Lagrange equations of motion for the Euler 
angles, but it is simpler to immediately notice that according to Eqs. (75)-(77), the Lagrangian function, 
T — U, does not depend explicitly on the “cyclic” coordinates g and y, so that the corresponding 
generalized momenta (2.31) are conserved: 


T : : ; 
P, = = 1,psin’ 0+1,(gcosO +) cos@ = const, (4.78) 
QP 
T . : 
ps oa = 1,(gcos?+w) = const, (4.79) 


where I, = [; +MI’. (According to Eq. (29), I, is just the body’s moment of inertia for rotation about a 
horizontal axis passing through the support point A.) According to the last of Eqs. (74), p, is just L3, 1.e. 
the angular momentum’s component along the precessing axis m3. On the other hand, by its very 
definition (78), Py 1s Lz, 1.e. the same vector L’s component along the stationary axis z. (Actually, we 
could foresee in advance the conservation of both these components of L for our system, because the 
vector (69) of the external torque is perpendicular to both n; and n,.) Using this notation, and solving 
the simple system of two linear equations (78)-(79) for the angle derivatives, we get 

_ L,—L,cosé . L, L,-L,cosé 


; = cos@. 4.80 
‘ I, sin’? 0 M ie I, sin’? 0 oe 


One more conserved quantity in this problem is the full mechanical energy?° 
L(y Dee 
E=T+U= a +@° sin* 0)+ 5 (Poo +w) + Mglcosd. (4.81) 


Plugging Eqs. (80) into Eq. (81), we get a first-order differential equation for the angle @ which may be 
represented in the following physically transparent form: 
(L,-L,cos0) — L; 


Ty 42 
“6°4U(0=E, U (0) = ———_ + — + Mel cos6@+ const. 4.82 
2 ef ( ) ot ( ) 21, sin? 0 Oe &' ( ) 


Thus, similarly to the planetary problems considered in Sec. 3.4, the torque-induced precession 
of a symmetric top has been reduced (without any approximations!) to a 1D problem of the motion of 
just one of its degrees of freedom, the polar angle @, in the effective potential U. 0). According to Eq. 
(82), very similar to Eq. (3.44) for the planetary problem, this potential is the sum of the actual potential 
energy U given by Eq. (77), and a contribution from the kinetic energy of motion along two other 
angles. In the absence of rotation about the axes n, and nz (i.e., L, = L3 = 0), Eq. (82) is reduced to the 
first integral of the equation (40) of motion of a physical pendulum, with J’ = J,. If the rotation is 
present, then (besides the case of very special initial conditions when 0(0) = 0 and L, = L3),?' the first 
contribution to U2) diverges at 0 > 0 and 7, so that the effective potential energy has a minimum at 
some non-zero value & of the polar angle 0 — see Fig. 11. 


20 Indeed, since the Lagrangian does not depend on time explicitly, H = const, and since the full kinetic energy T 
(75)-(76) is a quadratic-homogeneous function of the generalized velocities, we have E = H. 

21 Tn that simple case, the body continues to rotate about the vertical symmetry axis: At) = 0. Note, however, that 
such motion is stable only if the spinning speed is sufficiently high — see Eq. (85) below. 
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Fig. 4.11. The effective potential energy 
Up of the symmetric top, given by Eq. 
(82), as a function of the polar angle @, 
for a particular value (0.95) of the ratio r 
= L,/L; (so that at Qo >> On, % = cos 
x 0.10117, and several values of the 
ratiO @,or/@n — see Eq. (85). 


0 0.1 0.2 0.3 0.4 
O/a 


If the initial angle 9(0) is equal to this value @,1.e. if the initial effective energy is equal to its 
minimum value U. 4), the polar angle remains constant through the motion: At) = . This corresponds 
to the pure torque-induced precession whose angular velocity is given by the first of Eqs. (80): 


. L,—L, cos, 


oO... = 4.83 
pe =P I, sin’ 6, oo 


The condition for finding 4, dU./d0@ = 0, is a transcendental algebraic equation that cannot be solved 
analytically for arbitrary parameters. However, in the high spinning speed limit (73), this is possible. 
Indeed, in this limit the Mg/-proportional contribution to Uer is small, and we may analyze its effect by 
successive approximations. In the 0" approximation, i.e. at Mg/ = 0, the minimum of U¢r is evidently 
achieved at cos@ = L,/L3, turning the precession frequency (83) to zero. In the next, 1“ approximation, 
we may require that at O= , the derivative of the first term of Eq. (82) for Ucr over cos@, equal to — 
LAL; — L3cos@/Iqsin’ 2 is canceled with that of the gravity-induced term, equal to Mgl. This 
immediately yields @pre = (Lz — L3cos@)/Iasin’ @ = Megl/L3, so that by identifying @t with a3 = L3/I; 
(see Fig. 8), we recover the simple expression (72). 


The second important result that may be readily obtained from Eq. (82) is the exact expression 
for the threshold value of the spinning speed for a vertically rotating top (@ = 0, L. = L3). Indeed, in the 
limit @— 0 this expression may be readily simplified: 


LC. 
U (0) const Met) gs (4.84) 
81, 2 


This formula shows that if @o¢ = L3//3 is higher than the following threshold value, 


Threshold 
(4.85) rotation 


speed 


22 Indeed, the derivative of the fraction 1/2/ ‘asin’ 0, taken at the point cos? = L,/L;, is multiplied by the numerator, 
(L,— L3cos@)’, which turns to zero at this point. 
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then the coefficient at 0” in Eq. (84) is positive, so that Uc has a stable minimum at @ = 0. On the other 
hand, if @; is decreased below @, the fixed point becomes unstable, so that the top falls. As the plots in 
Fig. 11 show, Eq. (85) for the threshold frequency works very well even for non-zero but small values of 
the precession angle @. Note that if we take J = J, in the condition (73) of the approximate treatment, it 
acquires a very simple sense: @,ot >> On. 


Finally, Eqs. (82) give a natural description of one more phenomenon. If the initial energy is 
larger than U.), the angle @ oscillates between two classical turning points on both sides of the fixed 
point  — see Fig. 11 again. The law and frequency of these oscillations may be found exactly as in Sec. 
3.3 — see Eqs. (3.27) and (3.28). At @; >> @p, this motion is a fast rotation of the symmetry axis n3 of 
the body about its average position performing the slow torque-induced precession. Historically, these 
oscillations are called nutations, but their physics is similar to that of the free precession that was 
analyzed in the previous section, and the order of magnitude of their frequency is given by Eq. (59). 


It may be proved that small friction (not taken into account in the above analysis) leads first to 
decay of these nutations, then to a slower drift of the precession angle @ to zero, and finally, to a 
gradual decay of the spinning speed @,.; until it reaches the threshold (85) and the top falls. 


4.6. Non-inertial reference frames 


Now let us use the results of our analysis of the rotation kinematics in Sec. 1 to complete the 
discussion of the transfer between two reference frames, which was started in the introductory Chapter 
1. As Fig. 12 (which reproduces Fig. 1.2 in a more convenient notation) shows, even if the “moving” 
frame 0 rotates relative to the “lab” frame 0’, the radius vectors observed from these two frames are still 
related, at any moment of time, by the simple Eq. (1.5). In our new notation: 


r’=r,+Yr. (4.86) 
particle 
r 
“lab” “moving” 
frame . frame 


Fig. 4.12. The general case of transfer 
between two reference frames. 


However, as was mentioned in Sec. 1, the general addition rule for velocities is already more 
complex. To find it, let us differentiate Eq. (86) over time: 


d 
r ry +—r. 4.87 
dt  dt° dt ee? 
The left-hand side of this relation is evidently the particle’s velocity as measured in the lab frame, and 
the first term on the right-hand side is the velocity Vo of point 0, as measured in the same lab frame. The 
last term is more complex: due to the possible mutual rotation of the frames 0 and 0’, that term may not 
vanish even if the particle does not move relative to the rotating frame 0 — see Fig. 12. 
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Fortunately, we have already derived the general Eq. (8) to analyze situations exactly like this 
one. Taking A = r in it, we may apply the result to the last term of Eq. (87), to get 


v 


intab = Volinap +(V+@xr), (4.88) 


where @ is the instantaneous angular velocity of an imaginary rigid body connected to the moving 
reference frame (or we may say, of this frame as such), as measured in the /ab frame 0’, while v is dr/dt 
as measured in the moving frame 0. The relation (88), on one hand, is a natural generalization of Eq. 
(10) for v # 0; on the other hand, if @ = 0, it is reduced to simple Eq. (1.8) for the translational motion of 
the frame 0. 


To calculate the particle’s acceleration, we may just repeat the same trick: differentiate Eq. (88) 
over time, and then use Eq. (8) again, now for the vector A = v + @xr. The result is 


a 


iis = Bgl antcn +L (v+oxr)+ox(v tox) (4.89) 


Carrying out the differentiation in the second term, we finally get the goal relation, 


a »tatoxrt+2@xv+ox(oxr), (4.90) 


b = Ao 


in la in lal 


where a is particle’s acceleration as measured in the moving frame. This result is a natural 
generalization of the simple Eq. (1.9) to the rotating frame case. 


Now let the lab frame 0’ be inertial; then the 2"* Newton’s law for a particle of mass m is 


MAl in tay = F (4.91) 


where F is the vector sum of all forces exerted on the particle. This is simple and clear; however, in 
many cases it is much more convenient to work in a non-inertial reference frame. For example, when 
describing most phenomena on the Earth’s surface, it is rather inconvenient to use a reference frame 
bound to the Sun (or to the galactic center, etc.). In order to understand what we should pay for the 
convenience of using a moving frame, we may combine Eqs. (90) and (91) to write 


ma=F-ma, 


—MmQ@xX(@xr)—2m@x V— mo xr. (4.92) 


in lab 


This result means that if we want to use an analog of the 2’ Newton’s law in a non-inertial reference 
frame, we have to add, to the actual net force F exerted on a particle, four pseudo-force terms, called 
inertial forces, all proportional to the particle’s mass. Let us analyze them one by one, always 
remembering that these are just mathematical terms, not actual physical forces. (In particular, it would 
be futile to seek a 3“-Newton’s-law counterpart for any inertial force.) 


The first term, —maolin jab, 18 the only one not related to rotation and is well known from 
undergraduate mechanics. (Let me hope the reader remembers all these weight-in-the-accelerating- 
elevator problems.) However, despite its simplicity, this term has more subtle consequences. As an 
example, let us consider, semi-qualitatively, the motion of a planet, such as our Earth, orbiting a star and 
also rotating about its own axis — see Fig. 13. The bulk-distributed gravity forces, acting on a planet 
from its star, are not quite uniform, because they obey the 1/r gravity law (1.15), and hence are 
equivalent to a single force applied to a point A slightly offset from the planet’s center of mass 0, toward 
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the star. For a spherically-symmetric planet, the direction from 0 to A would be exactly aligned with the 
direction toward the star. However, real planets are not absolutely rigid, so due to the centrifugal “force” 
(to be discussed momentarily), the rotation about their own axis makes them slightly ellipsoidal — see 
Fig. 13. (For our Earth, this equatorial bulge is about 10 km.) As a result, the net gravity force is slightly 
offset from the direction toward the center of mass 0. On the other hand, repeating all the arguments of 
this section for a body (rather than a point), we may see that, in the reference frame moving with the 
planet, the inertial force —Mag (with the magnitude of the total gravity force, but directed from the star) 
is applied exactly to the center of mass. As a result, this pair of forces creates a torque t perpendicular to 
both the direction toward the star and the vector OA. (In Fig. 13, the torque vector is perpendicular to the 
plane of the drawing). If the angle 6 between the planet’s “polar” axis of rotation and the direction 
towards the star was fixed, then, as we have seen in the previous section, this torque would induce a 
slow axis precession about that direction. 


Fig. 4.13. The axial precession 
of a planet (with the equatorial 
polar axis bulge and the OA-offset 
strongly exaggerated). 


polar axis 
in “summer” ‘ in “winter” 


However, as a result of the orbital motion, the angle 6 oscillates in time much faster (once a 
year) between values (7/2 + €) and (7/2 — &), where ¢ is the axis tilt, i.e. angle between the polar axis 
(the direction of vectors L and @,o:) and the normal to the ecliptic plane of the planet’s orbit. (For the 
Earth, ¢ ~ 23.4°.) A straightforward averaging over these fast oscillations*? shows that the torque leads 
to the polar axis’ precession about the axis perpendicular to the ecliptic plane, keeping the angle ¢ 
constant — see Fig. 13. For the Earth, the period Tyre = 27/@pre of this precession of the equinoxes, 
corrected to a substantial effect of the Moon’s gravity, is close to 26,000 years.74 


Returning to Eq. (92), the direction of the second term of its right-hand side, 


F,, =-mox(oxr), (4.93) 


Cc 


called the centrifugal force, is always perpendicular to, and directed out of the instantaneous rotation 
axis — see Fig. 14. Indeed, the vector mxr is perpendicular to both m and r (in Fig. 14, normal to the 
drawing plane and directed from the reader) and has the magnitude @rsin@= wp, where pis the distance 
of the particle from the rotation axis. Hence the outer vector product, with the account of the minus sign, 
is normal to the rotation axis @, directed from this axis, and is equal to w rsin@ = wp. The centrifugal 
“force” is of course just the result of the fact that the centripetal acceleration w’p, explicit in the inertial 
reference frame, disappears in the rotating frame. For a typical location of the Earth (9 ~ Re ~ 6x10° m), 


23 Details of this calculation may be found, e.g., in Sec. 5.8 of the textbook by H. Goldstein et al., Classical 
Mechanics, 3" ed., Addison Wesley, 2002. 
24 This effect is known from antiquity, apparently discovered by Hipparchus of Rhodes (190-120 BC). 
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with its angular velocity @; ~ 10“ s", the acceleration is rather considerable, of the order of 3 cm/s’, i.e. 
~0.003 g, and is responsible, in particular, for the largest part of the equatorial bulge mentioned above. 


—M® x (@ xr) 


Fig. 4.14. The centrifugal force. 


As an example of using the centrifugal force concept, let us return again to our “testbed” 
problem on the bead sliding along a rotating ring — see Fig. 2.1. In the non-inertial reference frame 
attached to the ring, we have to add, to the actual forces mg and N exerted on the bead, the horizontal 
centrifugal force25 directed from the rotation axis, with the magnitude map. Its component tangential to 
the ring equals (ma p)cos@ = ma RsinGos@, and hence the component of Eq. (92) along this direction 
is ma = —mgsinO + ma Rsin@os6. With a = RO, this gives us an equation of motion equivalent to Eq. 
(2.25), which had been derived in Sec. 2.2 (in the inertial frame) using the Lagrangian formalism. 


The third term on the right-hand side of Eq. (92), 


is the so-called Coriolis force,*° which is different from zero only if the particle moves in a rotating 
reference frame. Its physical sense may be understood by considering a projectile fired horizontally, say 
from the North Pole — see Fig. 15. 


Fig. 4.15. The trajectory of a projectile fired 
horizontally from the North Pole, from the 
point of view of an Earth-bound observer 
looking down. The circles show parallels, 
while the straight lines mark meridians. 


From the point of view of an Earth-based observer, the projectile will be affected by an 
additional Coriolis force (94), directed westward, with the magnitude 2ma@gv, where v is the main, 
southward component of the velocity. This force would cause the westward acceleration a = 2@ gv, and 
hence the westward deviation growing with time as d = at*/2 = w_vt’. (This formula is exact only if d is 
much smaller than the distance r = vt passed by the projectile.) On the other hand, from the point of 


25 For this problem, all other inertial “forces”, besides the Coriolis force (see below) vanish, while the latter force 
is directed normally to the ring and does not affect the bead’s motion along it. 

26 Named after G.-G. de Coriolis (already reverently mentioned in Chapter 1) who described its theory and 
applications in detail in 1835, though the first semi-quantitative analyses of this effect were given by Giovanni 
Battista Riccioli and Claude Francois Dechales already in the mid-1600s, and all basic components of the Coriolis 
theory may be traced to a 1749 work by Leonard Euler. 
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view of an inertial-frame observer, the projectile’s trajectory in the horizontal plane is a straight line. 
However, during the flight time ¢, the Earth’s surface slips eastward from under the trajectory by the 
distance d = ry = (vt)(a¢t) = @pvt’, where y= agt is the azimuthal angle of the Earth’s rotation during 
the flight). Thus, both approaches give the same result — as they should. 


Hence, the Coriolis “force” is just a fancy (but frequently very convenient) way of description of 
a purely geometric effect pertinent to the rotation, from the point of view of the observer participating in 
it. This force is responsible, in particular, for the higher right banks of rivers in the Northern 
hemisphere, regardless of the direction of their flow — see Fig. 16. Despite the smallness of the Coriolis 
force (for a typical velocity of the water in a river, v ~ 1 m/s, it is equivalent to acceleration ac ~ 107 
cm/s’ ~ 10° g), its multi-century effects may be rather prominent.’ 


Fig. 4.16. Coriolis forces due to the 
Earth’s rotation, in the Northern 
hemisphere. 


Finally, the last, fourth term of Eq. (92), -m@xr, exists only when the rotation frequency 
changes in time, and may be interpreted as a local-position-specific addition to the first term. 


The key relation (92), derived above from Newton’s equation (91), may be alternatively obtained 
from the Lagrangian approach. Indeed, let us use Eq. (88) to represent the kinetic energy of the particle 
in an inertial “lab” frame in terms of v and r measured in a rotating frame: 


aw t(VtOXxn)), (4.95) 


m 

T= al 
and use this expression to calculate the Lagrangian function. For the relatively simple case of a 
particle’s motion in the field of potential forces, measured from a reference frame that performs a pure 
rotation (so that Volin tab = 0)28 with a constant angular velocity @, we get 


L=T-U=—7y +mv-(oxr) +> (xr): — =>V +mv-(@xr)—-U,,, (4.96a) 


where the effective potential energy,? 


27 The same force causes the counterclockwise circulation in the “Nor’easter” storms on the US East Coast, with 
the radial component of the air velocity directed toward the cyclone’s center, due to lower pressure in its middle. 
28 A similar analysis of the cases with Volin 1a # 0, for example, of a translational relative motion of the reference 
frames, is left for the reader’s exercise. 
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Us, =U+Uy, with Uy = Fo xr), (4.96b) 


is just the sum of the actual potential energy U of the particle and the so-called centrifugal potential 
energy, associated with the centrifugal “force” (93): 


F,, =-VU, = 9-2 ox) = —mo x (@ xr). (4.97) 


It is straightforward to verify that the Lagrange equations (2.19), derived from Eqs. (96) considering the 
Cartesian components of r and v as generalized coordinates and velocities, coincide with Eq. (92) (with 
Aolin lab — 0, oO = 0, and F = —-VU). 

Now it is very informative to have a look at a by-product of this calculation, the generalized 
momentum (2.31) corresponding to the particle’s coordinate r as measured in the rotating reference 
frame,3° 


(4.98) 


According to Eq. (88) with volin 1a) = 0, the expression in the parentheses is just Vlin ab. However, from 
the point of view of the moving frame, i.e. not knowing about the simple physical sense of the vector f, 
we would have a reason to speak about two different linear momenta of the same particle, the so-called 
kinetic momentum p = mv and the canonical momentum fp = p + mowxr.*! Let us calculate the 
Hamiltonian function H defined by Eq. (2.32), and the energy EF as functions of the same moving-frame 
variables: 


2 


3 
Hay ey, -L=p-y-L=mly +exr)-v-| By! +mv-(@xt)-Uy |= "Uy, (4.99) 


ag: 


E=aT+u=7v +mv-(oxr)+ (axe) +U => v4 +U., +mv-(@xr)+m(oxr)’. (4.100) 


These expressions clearly show that E and H are not equal.” In hindsight, this is not surprising, because 
the kinetic energy (95), expressed in the moving-frame variables, includes a term linear in v, and hence 


2° For the attentive reader who has noticed the difference between the negative sign in the expression for Us, and 
the positive sign before the similar second term in Eq. (3.44): as was already discussed in Chapter 3, it is due to 
the difference of assumptions. In the planetary problem, even though the angular momentum L and hence its 
component L, are fixed, the corresponding angular velocity @ is not. On the opposite, in our current discussion, 


the angular velocity @ of the reference frame is assumed to be fixed, i.e. is independent of r and v. 

30 Here OL/Ov is just a shorthand for a vector with Cartesian components OL/0v;. In a more formal language, this is 
the gradient of the scalar function L in the velocity space. 

3! A very similar situation arises at the motion of a particle with electric charge q in magnetic field &. In that case, 
the role of the additional term ” — p = maxr is played by the product g.% where W is the vector potential of the 


field B= Vx.¥ — see, e.g., EM Sec. 9.7, and in particular Eqs. (9.183) and (9.192). 

32 Please note the last form of Eq. (99), which shows the physical sense of the Hamiltonian function of a particle 
in the rotating frame very clearly, as the sum of its kinetic energy (as measured in the moving frame), and the 
effective potential energy (96b), including that of the centrifugal “force”. 
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is not a quadratic-homogeneous function of this generalized velocity. The difference between these 
functions may be represented as 


E-H =mv-(@xr)+m(@xr) =m(v+@xr):(@xr)=mv 


anu (OXE). (4.101) 


Now using the operand rotation rule again, we may transform this expression into a simpler form:33 


(4.102) 


As a sanity check, let us apply this general expression to the particular case of our testbed 
problem — see Fig. 2.1. In this case, the vector @ is aligned with the z-axis, so that of all Cartesian 
components of the vector L, only the component L, is important for the scalar product in Eq. (102). This 
component evidently equals wl, = amp’ = wm(Rsin@)’, so that 


E-H=ma’R’ sin’ 0, (4.103) 


i.e. the same result that follows from the subtraction of Eqs. (2.40) and (2.41). 


4.7. Exercise problems 


4.1. Calculate the principal moments of inertia for the following uniform rigid bodies: 


(ii) (iii) (iv) 


or x x. 


(i) a thin, planar, round hoop, (11) a flat round disk, (iii) a thin spherical shell, and (iv) a solid sphere. 


Compare the results, assuming that all the bodies have the same radius R and mass M, and give 
an interpretation of their difference. 


4.2. Calculate the principal moments of inertia for the rigid bodies shown in the figure below: 


() ‘i (i 
i Am 


(1) an equilateral triangle made of thin rods with a uniform linear mass density yu, 
(11) a thin plate in the shape of an equilateral triangle, with a uniform areal mass density o, and 


33 Note that by the definition (1.36), the angular momenta L of particles merely add up. As a result, the final form 
of Eq. (102) is valid for an arbitrary system of particles. 
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(11) a tetrahedron with a uniform bulk mass density p. 


Assuming that the total mass of the three bodies is the same, compare the results and give an 
interpretation of their difference. 


4.3. Calculate the principal moments of inertia of a thin uniform plate cut in the form of a right 
triangle with two 7/4 angles. 


4.4. Prove that Eqs. (34)-(36) are valid for rotation of a rigid body about a fixed axis z, even if it 
does not pass through its center of mass. 


4.5. Calculate the kinetic energy of a right circular cone with height H, 
base radius R, and a constant mass density p, that rolls over a horizontal 
surface without slippage, making f turns per second about the vertical axis — 
see the figure on the right. 


4.6. External forces exerted on a rigid body, rotating with an angular velocity w, have zero vector 
sum but a non-vanishing net torque t about its 
center of mass. Calculate the work of the forces on 
the body per unit time, i.e. their instantaneous 
power. Prove that the same result is valid for a 
body rotating about a fixed axis and the torque’s 
component along this axis. Use the last result to 
prove that at negligible friction, the differential 
gear assembly (see the figure on the right)*4 
distributes the external torque, applied to its . Right axle shaft 
satellite-carrier axis to rotate it about the common | 
axis of two axle shafts, equally to both shafts, even 
if they rotate with different angular velocities. aatelinte <— Satellite-carrier axis 


Satellite 


Left wheel planetary < pee wheel planetary 


Left axle shaft 


m 
4.7. The end of a uniform thin rod of length 2/ and mass m, initially at rest, Vo ! 
is hit by a bullet of mass m', flying with velocity vo (see the figure on the right), m 

which gets stuck in the rod. Use two different approaches to calculate the velocity 
of the opposite end of the rod right after the collision. I 


4.8. A ball of radius R, initially at rest on a horizontal table, is hit 
with a billiard cue in the horizontal direction, at height h above the table — 
see the figure on the right. Using the same Coulomb approximation for the 
friction force between the ball and the table as in Problem 1.4 (|F|max = AN), 
calculate the final linear velocity of the rolling ball as a function of h. 
Would it matter if the hit point is shifted horizontally (normally to the plane 
of the drawing)? 


34 Figure from G. Antoni, Sci. World J., 2014, 523281 (2014), adapted with permission. Both satellite gears may 
rotate freely about their carrier axis. 
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Hint: As in most solid body collision problems, during the short time of the cue hit, all other 
forces exerted on the ball may be considered negligibly small. 


4.9. A round cylinder of radius R and mass M may roll, without slippage, over a horizontal 
surface. The mass density distribution inside the cylinder is not uniform, so that its center of mass is at 
some distance / ¥ 0 from its geometrical axis, and the moment of inertia / (for rotation about the axis 
parallel to the symmetry axis but passing through the center of mass) is different from MR’/2, where M 
is the cylinder’s mass. Derive the equation of motion of the cylinder under the effect of the uniform 
vertical gravity field, and use it to calculate the frequency of its small oscillations of the cylinder near its 
stable equilibrium position. 


4.10. A body may rotate about a fixed horizontal axis A — see Fig. 5. Find the frequency of its 
small oscillations, in a uniform gravity field, as a function of the distance / of the axis from the body’s 
center of mass 0, and analyze the result. 


4.11. Calculate the frequency, and sketch the mode of oscillations of 
a round uniform cylinder of radius R and the mass M, that may roll, without { 
slipping, on a horizontal surface of a block of mass M’. The block, in turn, 
may move in the same direction, without friction, on a horizontal surface, 
being connected to it with an elastic spring — see the figure on the right. 


4.12. A thin uniform bar of mass MW and length / is hung on a light thread of length 
l’ (like a “chime” bell — see the figure on the right). Derive the equations of motion of the 
system within the plane of the drawing. 


4.13. A solid, uniform, round cylinder of mass M can roll, 
without slipping, over a concave, round cylindrical surface of a block 
of mass M’, in a uniform gravity field — see the figure on the right. g 


The block can slide without friction on a horizontal surface. Using the 
Lagrangian formalism, SN 


(i) find the frequency of small oscillations of the system near the equilibrium, and 
(11) sketch the oscillation mode for the particular case M’ = M, R’ = 2R. 


4.14. A uniform solid hemisphere of radius R is placed on a 
horizontal plane — see the figure on the right. Find the frequency of its 
small oscillations within a vertical plane, for two ultimate cases: 


(1) there is no friction between the hemisphere and plane surfaces; 
(ii) the static friction is so strong that there is no slippage between 
these surfaces. 


4.15. For the “sliding ladder” problem started in Sec. 3 (see Fig. 7), find the critical value a of 
the angle @ at that the ladder loses its contact with the vertical wall, assuming that it starts sliding from 
the vertical position, with a negligible initial velocity. 
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4.16. Six similar, uniform rods of length / and mass m are connected by 
light joints so that they may rotate, without friction, versus each other, forming 
a planar polygon. Initially, the polygon was at rest, and had the correct hexagon 
shape — see the figure on the right. Suddenly, an external force F is applied to 
the middle of one rod, in the direction of the hexagon’s symmetry center. 
Calculate the accelerations: of the rod to which the force is applied (a), and of 
the opposite rod (a’), immediately after the application of the force. 


4.17. A rectangular cuboid (parallelepiped) with sides a1, az, and a3, 
made of a material with a constant mass density p, is rotated, with a constant 
angular velocity @, about one of its space diagonals — see the figure on the a, 
right. Calculate the torque t necessary to sustain such rotation. 


4.18. A uniform round ball moves, without slippage, over a “turntable”: a horizontal plane 
rotated about a vertical axis with a time-independent angular velocity ©. Derive a self-consistent 
equation of motion of the ball’s center of mass, and discuss its solutions. 


4.19. Calculate the free precession frequency of a flat round disk rotating with an angular 
velocity @ about a direction very close to its symmetry axis, from the point of view of: 


(1) an observer rotating with the disk, and 
(11) a lab-based observer. 


4.20. Use the Euler equations to prove the fact mentioned in Sec. 4: free rotation of an arbitrary 
body (“asymmetric top”) about its principal axes with the smallest and largest moments of inertia is 
stable with respect to small variations of initial conditions, while that about the intermediate-J; axis is 
not. Illustrate the same fact using the Poinsot construction. 


4.21. Give an interpretation of the torque-induced precession, explaining its direction, using any 
simple system exhibiting this effect, as a model. 


4.22. One end of a light shaft of length / is firmly 
attached to the center of a thin uniform solid disk of radius R << 
/ and mass M, whose plane is perpendicular to the shaft. Another 
end of the shaft is attached to a vertical axis (see the figure on 
the right) so that the shaft may rotate about the axis without 
friction. The disk rolls, without slippage, over a horizontal 
surface, so that the whole system rotates about the vertical axis with a constant angular velocity o. 
Calculate the (vertical) supporting force N exerted on the disk by the surface. 


4.23. A coin of radius r is rolled over a horizontal surface, without 
slippage. Due to its tilt @ it rolls around a circle of radius R — see the figure on 
the right. Modeling the coin as a very thin round disk, calculate the time period | ar 
of its motion around the circle. 
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4.24. A symmetric top on point support (as shown see, e.g., Fig. 9), rotating around its symmetry 
axis with high angular velocity @ot, is subjected to not only its weight Mg, but also an additional force 
also applied to the top’s center of mass, but directed normally to g, with its vector rotating in the 
horizontal plane with a constant angular velocity @ << @ot. Derive the system of equations describing 
the top’s motion. Analyze their solution for the simplest case when @ is exactly equal to the frequency 
(72) of the torque-induced precession in the gravity field alone. 


4.25. Analyze the effect of small friction on the fast rotation of a symmetric top around its 
symmetry axis, using a simple model in that the lower end of the body is a right cylinder of radius R. 


4.26. An air-filled balloon is placed inside a water-filled container, which moves by inertia in 
free space, at negligible gravity. Suddenly, force F is applied to the container, pointing in a certain 
direction. What direction does the balloon move relative to the container? 


4.27. Two planets are in a circular orbit around their common center of mass. Calculate the 
effective potential energy of a much lighter body (say, a spacecraft) rotating with the same angular 
velocity, on the line connecting the planets. Sketch the plot of the radial dependence of Ues and find out 
the number of so-called Lagrange points is which the potential energy has local maxima. Calculate their 
position explicitly in the limit when one of the planets is much more massive than the other one. 


4.28. Besides the three Lagrange points Li, L2, and L3 discussed in the previous problem, which 
are located on the line connecting two planets on circular orbits about their mutual center of mass, there 
are two off-line points L4 and Ls — both within the pane of the planets’ rotation. Calculate their 
positions. 


4.29. The following simple problem may give additional clarity to the physics of the Coriolis 
“force”. Consider a bead of mass m, which may slide, without friction, along a straight rod that is 
rotated, within a horizontal plane with a constant angular velocity w — see the figure on the right. 
Calculate the bead’s linear acceleration and the force N exerted on it by the rod, in: 


(i) an inertial (“lab”) reference frame, and 
(11) the non-inertial reference frame rotating with the rod (but not moving with the bead), 
and compare the results. 


4.30. Analyze the dynamics of the famous Foucault pendulum, used for spectacular 
demonstrations of the Earth’s rotation. In particular, calculate the angular velocity of the rotation of its 
oscillation plane relative to the Earth’s surface, at the location with the polar angle (“colatitude’”) ©. 
Assume that the pendulum oscillation amplitude is small enough to neglect nonlinear effects, and that its 
oscillation period is much shorter than 24 hours. 


4.31. A small body is dropped down to the surface of Earth from height 4 << Rp, without initial 
velocity. Calculate the magnitude and direction of its deviation from the vertical, due to the Earth 


rotation. Estimate the effect’s magnitude for a body dropped from the Empire State Building. 


4.32. Calculate the height of solar tides on a large ocean, using the following simplifying 
assumptions: the tide period (2 of Earth's day) is much longer than the period of all ocean waves, the 
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Earth (of mass Mg) is a sphere of radius Rg, and its distance rs from the Sun (of mass Ms) is constant and 
much larger than Rez. 


4.33. A satellite is on a circular orbit of radius R, around the Earth. 


(i) Write the equations of motion of a small body as observed from the satellite, and simplify 
them for the case when the motion is limited to the satellite’s close vicinity. 

(ii) Use the equations to prove that the body may be placed on an elliptical trajectory around the 
satellite’s center of mass, within its plane of rotation about Earth. Calculate the ellipse’s orientation and 
eccentricity. 


4.34. A non-spherical shape of an artificial satellite may ensure its stable angular orientation 
relative to Earth’s surface, advantageous for many practical goals. Modeling the satellite as a strongly 
elongated, axially-symmetric body, moving around the Earth on a circular orbit of radius R, find its 
stable orientation. 


4.35." A rigid, straight, uniform rod of length /, with the lower end on a pivot, falls 
in a uniform gravity field — see the figure on the right. Neglecting friction, calculate the 
distribution of the bending torque 7 along its length, and analyze the result. l 


Hint: The bending torque is the net torque of the force F acting between two parts 
of the rod, mentally separated by its cross-section, about a certain “neutral axis”.?5 As will g| 
be discussed in detail in Sec. 7.5, at the proper definition of this axis, the bending torque’s 

gradient along the rod’s length is equal to (-F), where F is the rod-normal (“shear”) 

component of the force exerted by the top part of the rod on its lower part. 


4.36. Let r be the radius vector of a particle, as measured in a possibly non-inertial but certainly 
non-rotating reference frame. Taking its Cartesian components for the generalized coordinates, calculate 
the corresponding generalized momentum / of the particle, and its Hamiltonian function H. Compare ~ 
with mv, and H with the particle’s energy E. Derive the Lagrangian equation of motion in this approach, 
and compare it with Eq. (92). 


35 Inadequate definitions of this torque are the main reason for numerous wrong solutions of this problem, posted 
online — readers beware! 
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Chapter 5. Oscillations 


In this course, oscillations and waves are discussed in detail, because of their importance for 
fundamental and applied physics. This chapter starts with a discussion of the harmonic oscillator, 
whose differential equation of motion is linear and hence allows the full analytical solution, and then 
proceeds to so-called “nonlinear” and “parametric” systems whose dynamics may be only explored by 
either approximate analytical or numerical methods. 


5.1. Free and forced oscillations 


In Sec. 3.2 we briefly discussed oscillations in a keystone Hamiltonian system — a 1D harmonic 
oscillator described by a very simple Lagrangian! 


L =1@)-UQ)=5¢ ae (5.1) 


whose Lagrange equation of motion,” 


mg+Kg=0, ie. G+a q=0, i (5.2) 


is a linear homogeneous differential equation. Its general solution is given by Eq. (3.16), which is 
frequently recast into another, amplitude-phase form: 


q(t) =ucos@ ot +vsin@t = A cos(@pt - 9), (5.3a) 


where A is the amplitude and g the phase of the oscillations, which are determined by the initial 
conditions. Mathematically, it is frequently easier to work with sinusoidal functions as complex 
exponents, by rewriting the last form of Eq. (3a) in one more form:3 


q(t) = Re Ae apa " = Ref ae a (5.3b) 
where a is the complex amplitude of the oscillations: 


a= Ae'®, | a] =, Rea = Acosg =u, Ima = Asing =v. (5.4) 


For an autonomous, Hamiltonian oscillator, Eqs. (3) give the full classical description of its dynamics. 
However, it is important to understand that this free-oscillation solution, with a constant amplitude A, 


' For the notation brevity, in this chapter I will drop indices “ef” in the energy components T and U, and in 
parameters like m, «, etc. However, the reader should still remember that T and U do not necessarily coincide with 
the actual kinetic and potential energies (even if those energies may be uniquely identified) — see Sec. 3.1. 

2 @ is usually called the own frequency of the oscillator. In quantum mechanics, the Germanized version of the 
same term, eigenfrequency, is used more. In this series, I will use either of the terms, depending on the context. 

3 Note that this is the so-called physics convention. Most engineering texts use the opposite sign in the imaginary 
exponent, exp{-iat} — exp{iat}, with the corresponding sign implications for intermediate formulas, but (of 
course) similar final results for real variables. 
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means the conservation of the energy E = T+ U= «A’/2 of the oscillator. If its energy changes for any 
reason, the description needs to be generalized. 


First of all, if the energy leaks out of the oscillator to its environment (the effect usually called 
the energy dissipation), the free oscillations decay with time. The simplest model of this effect is 
represented by an additional /inear drag (or “kinematic friction’) force, proportional to the generalized 
velocity and directed opposite to it: 


Fi =-nq, (5.5) 


where constant 77 is called the drag coefficient.4 The inclusion of this force modifies the equation of 
motion (2) to become 


mg+nq+Km =0. (5.6a) 


This equation is frequently rewritten in the form 


G+267+a2q=0, — with bas, (5.6b) 


m 


where the parameter 6 is called the damping coefficient (or just “damping”’). Note that Eq. (6) is still a 
linear homogeneous second-order differential equation, and its general solution still has the form of the 
sum (3.13) of two exponents of the type exp{At}, with arbitrary pre-exponential coefficients. Plugging 
such an exponent into Eq. (6), we get the following algebraic characteristic equation for A: 


AV +2614 02 =0. (5.7) 


Solving this quadratic equation, we get 
A, =-5 Li@y’, where 04! = (03 29°)", (5.8) 


so that for not very high damping (6 < @) we get the following generalization of Eq. (3):5 


Axt A_t 


+ee — ot — ot 


Dine (f) = C,€ =(u, cos@,'t+ v, sina, tle ° =A,e “ cos(a,'t-9,). (5.9) 
The result shows that, besides a certain correction to the free oscillation frequency (which is very small 
in the most interesting /ow damping limit, 6<< @), the energy dissipation leads to an exponential decay 
of oscillation amplitude with the time constant 7= 1/6: 


4 Here Eq. (5) is treated as a phenomenological model, but in statistical mechanics, such dissipative term may be 
derived as an average force exerted upon a system by its environment, at very general assumptions. As will be 
discussed in detail later in this series (QM Chapter 7 and SM Chapter 5), due to the numerous degrees of freedom 
of a typical environment (think about the molecules of air surrounding a macroscopic pendulum), its force also 
has a random component; as a result, the dissipation is fundamentally related to fluctuations. The latter effect may 
be neglected (as it is in this course) only if the oscillator’s energy £ is much higher than the energy scale of its 
random fluctuations — in the thermal equilibrium at temperature 7, the larger of kgT and ha/2. 

5 Systems with high damping (6 > @) can hardly be called oscillators, and though they are used in engineering 
and physics experiment (e.g., for the shock and sound isolation), due to the lack of time/space, for their detailed 
discussion I have to refer the interested reader to special literature — see, e.g., C. Harris and A. Piersol, Shock and 
Vibration Handbook, 5" ed., McGraw Hill, 2002. Let me only note that according to Eq. (8), the dynamics of 
systems with very high damping (6 >> @) has two very different time scales: a relatively short “momentum 
relaxation time” 1/2. = 1/25= m/n, and a much longer “coordinate relaxation time” 1/A,.* 25/a@ = n/k.. 
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(5.10) 


A very popular dimensionless measure of damping is the so-called quality factor QO (or just the 
Q-factor) which is defined as @/26, and may be rewritten in several other useful forms: 


ma, _ (mx) eee _ Ot 


: : a a (5.11) 


lie 
O=55 
where 7 = 2z/q@ is the oscillation period in the absence of damping — see Eq. (3.29). Since the 
oscillation energy E is proportional to A’, i.e. decays as exp{-2t/r}, i.e. with the time constant 7/2, the 
last form of Eq. (11) may be used to rewrite the Q-factor in one more form: 


E E 
a So (5.12) 
z P 


where / is the energy dissipation rate. (Other practical ways to measure Q will be discussed below.) 
The range of Q-factors of important oscillators is very broad, all the way from Q ~ 10 for a human leg 
(with relaxed muscles), to O ~ 10* of the quartz crystals used in electronic clocks and watches, all the 
way up to O ~ 10” for carefully designed microwave cavities with superconducting walls. 


In contrast to the decaying free oscillations, forced oscillations induced by an external force F(d), 
may maintain their amplitude (and hence energy) infinitely, even at non-zero damping. This process 
may be described using a still linear but now inhomogeneous differential equation 


mg+ng+Kg = F(t), (5.13a) 
or, more usually, the following generalization of Eq. (6b): 


(5.13b) 


For a mechanical linear, dissipative 1D oscillator (6), under the effect of an additional external force 

F(t), Eq. (13a) is just an expression of the 2" Newton law. However, according to Eq. (1.41), Eq. (13) is 

valid for any dissipative, /inear® 1D system whose Gibbs potential energy (1.39) has the form Ug(q, ft) = 
2 

Kq’/2 — F(t)q. 


The forced-oscillation solutions to Eq. (13) may be analyzed by two mathematically equivalent 
methods whose relative convenience depends on the character of function f(¢). 


(1) Frequency domain. Representing the function f(‘) as a Fourier sum of sinusoidal harmonics:7 
fO= he, (5.14) 


and using the linearity of Eq. (13), we may represent its general solution as a sum of the decaying free 
oscillations (9) with the frequency @p’, that are independent of the function f(7), and forced oscillations 
due to each of the Fourier components of the force:8 


6 This is a very unfortunate, but common jargon, meaning “the system described by linear equations of motion”. 
7 Here, in contrast to Eq. (3b), we may drop the operator Re, assuming that f,, = f..*, so that the imaginary 
components of the sum compensate for each other. 
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q(t) = free () + forced (¢), forced (¢) = ya,e a . (5.15) 


Plugging Eq. (15) into Eq. (13), and requiring the factors before each e’™ on both sides to be equal, we 
get 
a, = fX(@), (5.16) 


where the complex function 7(@), in our particular case equal to 


1 
@)= , 5.17 
uo) (cw -@ — 2iwd 17) 


is called either the response function or (especially for non-mechanical oscillators) the generalized 
susceptibility. From here, and Eq. (4), the amplitude of the oscillations under the effect of a sinusoidal 
force is 

1 


[(@? - 0”)? + (206)’ | 


A, =|a,|=|f.|7(@)|, — with | 7()|= 


(5.18) 


1/2 ° 


This formula describes, in particular, an increase of the oscillation amplitude 4,, at @—> @ — see 
the left panel of Fig. 1. In particular, at the exact equality of these two frequencies, 


1 


rar 3.19 
20,0 on) 


| o-a, = 
so that, according to Eq. (11), the ratio of the response magnitudes at @ = @ and a= 0 (|W(®)|e-0 = 
1/@_) is exactly equal to the Q-factor of the oscillator. Thus, the response increase is especially strong 
in the low-damping limit (6 << @, i.e. OQ >> 1); moreover, at O + «© and @ > @, the response 
diverges. (This mathematical fact is very useful for the methods to be discussed later in this section.) 
This is the classical description of the famous phenomenon of resonance, so ubiquitous in physics. 


Fig. 5.1. Resonance in the linear 
oscillator, for several values of Q. 


8 In physics, this mathematical property of linear equations is frequently called the /inear superposition principle. 
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Due to the increase of the resonance peak height, its width is inversely proportional to Q. 
Quantitatively, in the most interesting low-damping limit, i.e. at O >> 1, the reciprocal Q-factor gives 
the normalized value of the so-called full-width at half-maximum (FWHM) of the resonance curve:? 

A 1 
lean (5.20) 
a Q 
Indeed, this Aw is defined as the difference (@,— @.) between the two values of @ at that the square of 
the oscillator response function, | 7(@) ig (which is proportional to the oscillation energy), equals a half 
of its resonance value (19). In the low damping limit, these points are very close to @, so that in the 
linear approximation in | @— @ | << @, we may write (@ — @) =-(@+@)(@— @) ¥ 2a€ x 2wé, 
where 


6 =0-® (5.21) 


is a very convenient parameter called detuning, which will be repeatedly used later in this chapter, and 
beyond it. In this approximation, the second of Eqs. (18) is reduced to!° 


|z(o)| = mG ey (5.22) 


As a result, the points @, correspond to e = o, Le. @+ = @ + 6 = @(1 + 1/2Q), so that Ao = @.-—@.= 
wo/Q, thus proving Eq. (20). 


(11) Time domain. Returning to an arbitrary external force f(t), one may argue that Eqs. (9), (15)- 
(17) provide a full solution of the forced oscillation problem even in this general case. This is formally 
correct, but this solution may be very inconvenient if the external force is far from a sinusoidal function 
of time, especially if it is not periodic at all. In this case, we should first calculate the complex 
amplitudes f,, participating in the Fourier sum (14). In the general case of a non-periodic f(d), this is 
actually the Fourier integral,!! 


fO= free iar, (5.23) 


so that f,, should be calculated using the reciprocal Fourier transform, 
1 7 lot’ 41 
fo = [Fee at’. (5.24) 
27e 


Now we may use Eq. (16) for each Fourier component of the resulting forced oscillations, and rewrite 
the last of Eqs. (15) as 


° Note that the phase shift g = arg[7(@)] between the oscillations and the external force (see the right panel in Fig. 
1) makes its steepest change, by 7/2, within the same frequency interval Aw. 

10 Such function of frequency may be met in many branches of science, frequently under special names, including 
the “Cauchy distribution”, “the Lorentz function” (or “Lorentzian line”, or “Lorentzian distribution”), “the Breit- 
Wigner function” (or “the Breit-Wigner distribution’), etc. 

'l Let me hope that the reader knows that Eq. (23) may be used for periodic functions as well; in such a case, f,, is 
a set of equidistant delta functions. (A reminder of the basic properties of the Dirac 6-function may be found, for 
example, in MA Sec. 14.) 
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Towa 1) = Jace do= [z@) ie (da = deo go) fae sone’ “t) 
a - J x. 


i . (5.25) 
= [ar f) = do z(aje'~ |, 


7 * 


with the response function 7(@) given, in our case, by Eq. (17). Besides requiring two integrations, Eq. 
(25) is conceptually uncomforting: it seems to indicate that the oscillator’s coordinate at time ¢ depends 
not only on the external force exerted at earlier times ¢’ < ¢, but also at future times. This would 
contradict one of the most fundamental principles of physics (and indeed, science as a whole), causality: 
no effect may precede its cause. 


Fortunately, a straightforward calculation (left for the reader’s exercise) shows that the response 
function (17) satisfies the following rule:!2 


} uae 'do=0, — forr<0. (5.26) 


This fact allows the last form of Eq. (25) to be rewritten in either of the following equivalent forms: 


Linear 


domalt) = [ fU)GE rt! = f f(t-1)G(a)dr, (5.27) _ system's 


response 


where G(r), defined as the Fourier transform of the response function, 


Temporal 


(5.28) Green’s 


function 


is called the (temporal) Green’s function of the system. According to Eq. (26), G(z) = 0 for all 7 < 0. 


While the second form of Eq. (27) is frequently more convenient for calculations, its first form is 
more suitable for physical interpretation of the Green’s function. Indeed, let us consider the particular 
case when the force is a delta function 


f()=d(t-t’), — witht'<t, ie. r=t-1'>0, (5.29) 


representing an ultimately short pulse at the moment ft’, with unit “area” |f(/)dt. Substituting Eq. (29a) 
into Eq. (27),!3 we get 
q(t) = G(t-’). (5.30) 


Thus the Green’s function G(t — t’) is just the oscillator’s response, as measured at time ¢, to a short 
force pulse of unit “area”, exerted at time ¢’. Hence Eq. (27) expresses the linear superposition principle 
in the time domain: the full effect of the force f(¢) on a linear system is a sum of the effects of short 
pulses of duration dt’ and magnitude f(t’), each with its own “weight” G(t — t’) — see Fig. 2. 


12 Eq. (26) is true for any linear physical system in which f(t) represents a cause, and q(t) its effect. Following 
tradition, I discuss the frequency-domain expression of this causality relation (called the Kramers-Kronig 
relations) in the Classical Electrodynamics part of this lecture series — see EM Sec. 7.2. 

13 Technically, for this integration, t’ in Eq. (27) should be temporarily replaced with another letter, say ¢”’. 
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f(t’) 
Fig. 5.2. A schematic, finite-interval 
representation of a force f(#) as a sum of 
short pulses at all times ¢’ < ¢, and their 
contributions to the linear system’s 
response q(f), as given by Eq. (27). 


0 i 


This picture may be used for the calculation of Green’s function for our particular system. 
Indeed, Eqs. (29)-(30) mean that G(7) is just the solution of the differential equation of motion of the 
system, in our case, Eq. (13), with the replacement ¢t —> 1, and a 6-functional right-hand side: 


2 
d oo 128 dG(r) 
dt dt 


Since Eqs. (27) describes only the second term in Eq. (15), i.e. only the forced, rather than free 
oscillations, we have to exclude the latter by solving Eq. (31) with zero initial conditions: 


+ @,G(t) = (fr). (5.31) 


G(-0)= a 0)=0, (5.32) 


where t= — 0 means the instant immediately preceding r= 0. 


This problem may be simplified even further. Let us integrate both sides of Eq. (31) over an 
infinitesimal interval including the origin, e.g. [-dt/2, +dzt/2], and then follow the limit dz > 0. Since 
the Green’s function has to be continuous because of its physical sense as the (generalized) coordinate, 
all terms on the left-hand side but the first one vanish, while the first term yields dG/d | 49 —dG/d | 0: 
Due to the second of Eqs. (32), the last of these two derivatives has to equal zero, while the right-hand 
side of Eq. (31) yields 1 upon the integration. Thus, the function G(z) may be calculated for t > 0 (i.e. 
for all times when it is different from zero) by solving the homogeneous version of the system’s 
equation of motion for z > 0, with the following special initial conditions: 


G(0)=0, dG '0) =1. (5.33) 
dt 


This approach gives us a convenient way for the calculation of Green’s functions of linear 
systems. In particular for the oscillator with not very high damping (6 < @, i.e. O > '2), imposing the 
boundary conditions (33) on the homogeneous equation’s solution (9), we immediately get 


Oscillator’s 
Green’s 
function 


(5.34) 


(The same result may be obtained directly from Eq. (28) with the response function 7(@) given by Eq. 
(19). This way is, however, a little bit more cumbersome, and is left for the reader’s exercise.) 


Relations (27) and (34) provide a very convenient recipe for solving many forced oscillations 
problems. As a very simple example, let us calculate the transient process in an oscillator under the 
effect of a constant force being turned on at ¢ = 0, i.e. proportional to the theta-function of time: 
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Ill 


f(t) = frA(t) 


0, for t <0, 
(5.35) 


fy, for t>0, 


provided that at ¢ < 0 the oscillator was at rest, so that in Eq. (15), ¢free(t) = 0. Then the second form of 
Eq. (27), together with Eq. (34), yield 


g(t) = { f(t—1)G(n)dt = f, (e=2 OT sin wy't dt. (5.36) 
0 0 % 


The simplest way to work out such integrals is to represent the sine function under it as the imaginary 
part of exp {i@p 't}, and merge the two exponents, getting 


t 
q(t) = fo : Im : —e as ai = Tih ae cos Qt + aay ont] g: 49,97) 
o. —d +i0,' 0 k Oy 

This result, plotted in Fig. 3, is rather natural: it describes nothing more than the transient from 
the initial position g = 0 to the new equilibrium position go = fo/@ = Fo/«, accompanied by decaying 
oscillations. For this particular simple function (4), the same result might be also obtained by 
introducing a new variable g (t)= q(t) — go and solving the resulting homogeneous equation for g (with 
appropriate initial condition g (0) = —qo). However, for more complicated functions f(t), Green’s 
function approach is irreplaceable. 


Nv 


os Fig. 5.3. The transient process in a linear 
oscillator, induced by a step-like force f(z), for 


the particular case 6/@ = 0.1 (i.e., Q = 5). 


Note that for any particular linear system, its Green’s function should be calculated only once, 
and then may be repeatedly used in Eq. (27) to calculate the system response to various external forces — 
either analytically or numerically. This property makes the Green’s function approach very popular in 
many other fields of physics — with the corresponding generalization or re-definition of the function.!4 


5.2. Weakly nonlinear oscillations 


In comparison with systems discussed in the last section, which are described by linear 
differential equations with constant coefficients and thus allow a complete and exact analytical solution, 
oscillations in nonlinear systems (very unfortunately but commonly called nonlinear oscillations) 
present a complex and, generally, analytically intractable problem. However, much insight on possible 


14 See, e.g., Sec. 6.6, and also EM Sec. 2.7 and QM Sec. 2.2. 
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processes in such systems may be gained from a discussion of an important case of weakly nonlinear 
systems, which may be explored analytically. An important example of such systems is given by an 
anharmonic oscillator — a 1D system whose higher terms in the potential’s expansion (3.10) cannot be 
neglected, but are small and may be accounted for approximately. If, in addition, damping is low (or 
negligible), and the external harmonic force exerted on the system is not too large, the equation of 
motion is a slightly modified version of Eq. (13): 


638) 


where @ * @ is the anticipated frequency of oscillations (whose choice may be to a certain extent 
arbitrary — see below), and the right-hand side f is small (say, scales as some small dimensionless 
parameter ¢<< 1), and may be considered as a small perturbation. 


Since at ¢ = 0 this equation has the sinusoidal solution given by Eq. (3), one might naively think 
that at a nonzero but small ¢ the approximate solution to Eq. (38) should be sought in the form 


qt=qor+qOt+q+..., where gq” xe", (5.39) 


with g =A cos(at — g) « é’. This is a good example of an apparently impeccable mathematical 
reasoning that would lead to a very inefficient procedure. Indeed, let us apply it to the problem we 
already know the exact solution for, namely free oscillations in a linear but damped oscillator, for this 
occasion assuming the damping to be very low, 6/@ ~ € << 1. The corresponding equation of motion, 


Eq. (6), may be represented in form (38) if we take @= @ and 


f =-264, with dae. (5.40) 


The naive perturbation theory based on Eq. (39) would allow us to find small corrections, of the order of 
o, to the free, non-decaying oscillations Acos(@t — v). However, we already know from Eq. (9) that the 
main effect of damping is a gradual decrease of the free oscillation amplitude to zero, i.e. a very large 
change of the amplitude, though at low damping, 6 << @, this decay takes large time t ~ rt >> 1/@. 
Hence, if we want our approximate method to be productive (i.e. to work at all time scales, in particular 
for forced oscillations with stationary amplitude and phase), we need to account for the fact that even a 
small right-hand side of Eq. (38) may eventually lead to /arge changes of oscillation’s amplitude A (and 
sometimes, as we will see below, also of oscillation’s phase @) at large times, because of the slowly 
accumulating effects of the small perturbation. !> 


This goal may be achieved! by the account of these slow changes already in the “O™ 
approximation”, i.e. the basic part of the solution in the expansion (39): 


15 The same flexible approach is necessary for approximations used in quantum mechanics. The method 
discussed here is closer in spirit (though not completely identical) to the WKB approximation (see, e.g., QM Sec. 
2.4) rather than most perturbative approaches (QM Ch. 6). 

16 This approach has a long history and, unfortunately, does not have a commonly accepted name. It had been 
gradually developed in celestial mechanics, but its application to 1D systems (on which I am focusing) was 
clearly spelled out only in 1926 by Balthasar van der Pol. So, I will follow several authors who call it the van der 
Pol method. Note, however, that in optics and quantum mechanics, this method is commonly called the Rotating 
Wave Approximation (RWA). In math-oriented texts, this approach, and especially its extensions to higher 
approximations, is usually called either the small parameter method or the asymptotic method. The list of other 
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q® = A(t)cos[at—g(t)], with 4,g > 0 ate > 0. (5.41) 


(It is evident that Eq. (9) is a particular case of this form.) Let me discuss this approach using a simple 
but representative example of a dissipative (but high-Q) pendulum driven by a weak sinusoidal external 
force with a nearly-resonant frequency: 


G +267 + @; sing = f, cosat, (5.42) 


with | @ — wo|, 6 << wo, and the force amplitude fo so small that | g | << 1 at all times. From what we 
know about the forced oscillations from Sec. 1, in this case it is natural to identify @ on the left-hand 
side of Eq. (38) with the force’s frequency. Expanding sin qg into the Taylor series in small g, keeping 
only the first two terms of this expansion, and moving all small terms to the right-hand side, we can 
rewrite Eq. (42) in the following popular form (38):!7 


G+ @°q =—269 + 2aq + aq? + f, cosat = f(t,9,q)- (5.43) 


Here a = @’/6 in the case of the pendulum (though the calculations below will be valid for any a), and 
the second term on the right-hand side was obtained using the approximation already employed in Sec. 
1: (@ — @ ) q = 2@(@— m) gq = 2@éq, where € = w— wm is the detuning parameter that was already 
used earlier — see Eq. (21). 


Now, following the general recipe expressed by Eqs. (39) and (41), in the 1 approximation in f 
oc € we may look for the solution to Eq. (43) in the following form: 


q(t) = Acos¥+q"(t), where P=at-g, q” ~e. (5.44) 


Let us plug this solution into both parts of Eq. (43), keeping only the terms of the first order in ¢. Thanks 
to our (smart :-) choice of @ on the left-hand side of that equation, the two zero-order terms in that part 
cancel each other. Moreover, since each term on the right-hand side of Eq. (43) is already of the order of 
é, we may drop q‘ « ¢ from the substitution into that part at all, because this would give us only terms 
O(é) or higher. As a result, we get the following approximate equation: 


G  +a7°q® =fW= 25 <(Acos¥) + 204 cos'¥)+a(Acos¥)’ + f,cos@t. (5.45) 


According to Eq. (41), generally, 4 and @ should be considered (slow) functions of time. 
However, let us leave the analyses of the transient process and system’s stability until the next section, 
and use Eq. (45) to find stationary oscillations in the system, that are established after an initial transient 
process. For that limited task, we may take 4 = const, y = const, so that g represents sinusoidal 
oscillations of frequency @. Sorting the terms on the right-hand side according to their time 
dependence,'* we see that it has terms with frequencies @ and 3a: 


scientists credited for the development of this method, its variations, and extensions includes, most notably, N. 
Krylov, N. Bogolyubov, and Yu. Mitroplolsky. 

'7 This equation is frequently called the Duffing equation (or the equation of the Duffing oscillator), after Georg 
Duffing who carried out its first (rather incomplete) analysis in 1918. 

18 Using the second of Eqs. (44), cos wt may be rewritten as cos (¥ + ~) = cos Y cos g —sin'¥ sin g. Then using 
the identity given, for example, by MA Eq. (3.4): cos* = (3/4)cos ¥ + (1/4)cos 3¥, we get Eq. (46). 
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: : 1 
f= [250 + oad’ + fy cos °) cos ¥ + (2504 — f, sing)sin ¥ + get cos3¥. (5.46) 


Now comes the main punch of the van der Pol approach: mathematically, Eq. (45) may be 
viewed as the equation of oscillations in a linear, dissipation-free harmonic oscillator of frequency @ 
(not @p!) under the action of an external force f (). In our particular case, this force is given by Eq. 
(46) and has three terms: two “quadrature” components at that very frequency @, and the third one of 
frequency 3@. As we know from our analysis of this problem in Sec. 1, if any of the first two 
components is not equal to zero, g?? grows to infinity — see Eq. (19) with 6= 0. At the same time, by the 
very structure of the van der Pol approximation, g“” has to be finite — moreover, small! The only way to 
avoid these infinitely growing (so-called secular) terms is to require that the amplitudes of both 
quadrature components of f with frequency w are equal to zero: 


20d + 20d +f,cosp=0, 26a4- f,sing=0. (5.47) 


These two harmonic balance equations enable us to find both parameters of the forced 
oscillations: their amplitude A and phase g. The phase may be readily eliminated from this system (most 
easily, by expressing sing and cosg from Eqs. (47), and then requiring the sum sin? + cos” to equal 
1), and the solution for A recast in the following implicit but convenient form: 


2 2 Z 
A= UO naa : 7 where Hise =o-| Q@, eae 
4@° €°(A)+6 8 @ 


(5.48) 


This expression differs from Eq. (22) for the linear resonance in the low-damping limit only by the 
replacement of the detuning & with its effective amplitude-dependent value €(A) — or, equivalently, the 
replacement of the frequency @ of the oscillator with its effective, amplitude-dependent value 


(5.49) 


The physical meaning of @(A) is simple: this is just the frequency of free oscillations of amplitude A in 
a similar nonlinear system, but with zero damping.!? Indeed, for 6 = 0 and fo = 0 we could repeat our 
calculations, assuming that w is an amplitude-dependent eigenfrequency @(A). Then the second of Eqs. 
(47) is trivially satisfied, while the second of them gives Eq. (49). The implicit relation (48) enables us 
to draw the curves of this nonlinear resonance just by bending the linear resonance plots (Fig. 1) 
according to the so-called skeleton curve expressed by Eq. (49). Figure 4 shows the result of this 
procedure. Note that at small amplitude, (4) > @, i.e. we return to the usual, “linear” resonance (22). 


To bring our solution to its logical completion, we should still find the first perturbation g(t) 
from what is left of Eq. (45). Since the structure of this equation is similar to Eq. (13) with the force of 
frequency 3@ and zero damping, we may use Eqs. (16)-(17) to obtain 


'9 The effect of the pendulum’s frequency dependence on its oscillation amplitude was observed as early as 1673 
by Christiaan Huygens — who by the way had invented the pendulum clock, increasing the timekeeping accuracy 
by about three orders of magnitude. (He also discovered the largest of Saturn’s moons, Titan). 
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gq? @)== aA* cos3(at — 9). (5.50) 


320° 


Adding this perturbation (note the negative sign!) to the sinusoidal oscillation (41), we see that as the 
amplitude A of oscillations in a system with a > 0 (e.g., a pendulum) grows, their waveform becomes a 
bit more “blunt” near the largest deviations from the equilibrium. 


i) 


Fig. 5.4. The nonlinear resonance in the 
0.5 Duffing oscillator, as described by Eq. (48), 
for the particular case a = aM /6, Ja = 0.01 
(i.e. O = 50), and several values of the 
parameter fj/@, increased by equal steps of 


0 0.005 from 0 to 0.03. 
0.8 0.9 l Ll 


oO! W, 


The same Eq. (50) also enables an estimate of the range of validity of our first approximation: 
since it has been based on the assumption |g“? << |g®| < A, for this particular problem we have to 
require a@A*/32@° << 1. For a pendulum (i.e. for @= @’/6), this condition becomes A* << 192. Though 
numerical coefficients in such strong inequalities should be taken with a grain of salt, the large 
magnitude of this particular coefficient gives a good hint that the method may give very accurate results 
even for relatively large oscillations with A ~ 1. In Sec. 7 below, we will see that this is indeed the case. 


From the mathematical viewpoint, the next step would be to write the next approximation as 
q(t) = Acos¥ +g (th+q(), gq? ~ 6’, (5.51) 


(0) 


and plug it into the Duffing equation (43), which (thanks to our special choice of g and q‘”) would 
2° (2) 


retain only the sum g + @’q™ on its left-hand side. Again, requiring the amplitudes of two quadrature 


components of the frequency won the right-hand side to vanish, we may get second-order corrections to 
A and g. Then we may use the remaining part of the equation to calculate g, and then go after the 
third-order terms, etc. 2° However, for most purposes, the sum g + q", and sometimes even just the 
crudest approximation g alone, are completely sufficient. For example, according to Eq. (50), for a 
simple pendulum swinging as much as between the opposite horizontal positions (A = 7/2), the 1“ order 
correction g‘” is of the order of 0.5%. (Soon beyond this value, completely new dynamic phenomena 


20 For a mathematically rigorous treatment of higher approximations, see, e.g., Yu. Mitropolsky and N. Dao, 
Applied Asymptotic Methods in Nonlinear Oscillations, Springer, 2004. A more laymen (and, by today’s 
standards, somewhat verbose) discussion of various oscillatory phenomena may be found in the classical text A. 
Andronoy, A. Vitt, and S. Khaikin, Theory of Oscillators, first published in the 1960s and still available online as 
Dover’s republication in 2011. 
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start — see Sec. 7 below — but they cannot be described by these successive approximations at all.) Due 
to such reasons, higher approximations are rarely pursued for particular systems. 


5.3. Reduced equations 


A much more important issue is the stability of the solutions described by Eq. (48). Indeed, Fig. 
4 shows that within a certain range of parameters, these equations give three different values for the 
oscillation amplitude (and phase), and it is important to understand which of them are stable. Since these 
solutions are not the fixed points in the sense discussed in Sec. 3.2 (each point in Fig. 4 represents a 
nearly-sinusoidal oscillation), their stability analysis needs a more general approach that would be valid 
for oscillations with amplitude and phase slowly evolving in time. This approach will also enable the 
analysis of non-stationary (especially the initial transient) processes, which are of importance for some 
dynamic systems. 


First of all, let us formalize the way the harmonic balance equations, such as Eqs. (47), should be 
obtained for the general case (38) — rather than for the particular Eq. (43) considered in the last section. 
After plugging in the 0" approximation (41) into the right-hand side of equation (38) we have to require 
the amplitudes of both quadrature components of frequency @ to vanish. From the standard Fourier 
analysis, we know that these requirements may be represented as 


fo sin ¥ =0, f©cos¥ =0, (5.52) 


where the top bar means the time averaging — in our current case, over the period 27/@ of the right-hand 
side of Eq. (52), with the arguments calculated in the 0" approximation: 


f =f t,q 7g...) = f(t, Acos ¥,-Aasin ¥,...), with P= at-@. (5.53) 


Now, for a transient process the contribution of g to the left-hand side of Eq. (38) is not zero 
any longer, because its amplitude and phase may be both slow functions of time — see Eq. (41). Let us 
calculate this contribution. The exact result would be 


da’ 
G@ +a@°q® =| —+a@° |Acos(at - 
a. [< me) (5.54) 
= (4 + 2@0A - @ A)cos(at —9)-2A(@— @)sin(at — 9). 
However, in the first approximation in ¢, we may neglect the second derivative of A, and also the 


squares and products of the first derivatives of A and g (which are all of the second order in €), so that 
Eq. (54) is reduced to 


Gg +a@°q = 2A @acos(at — ~)—2Aasin(at - 9). (5.55) 
On the right-hand side of Eq. (53), we can neglect the time derivatives of the amplitude and phase at all, 


because this part is already proportional to the small parameter. Hence, in the first order in ¢, Eq. (38) 
becomes 


§ +a°q® = f= f -(2Ag@acos¥ -2Aasin¥). (5.56) 
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Now, applying Eqs. (52) to the function fe, and taking into account that the time averages of 
sin’ and cos” are both equal to %, while the time average of the product sin'¥cos’ vanishes, we get a 
pair of so-called reduced equations (alternatively called either “truncated”, or “RWA”, or “van der Pol” 
equations) for the time evolution of the amplitude and phase: 


(5.57a) 


Extending the definition (4) of the complex amplitude of oscillations to their slow evolution in time, a(t) 
= A(t)exp{ig(t)}, and differentiating this relation, the two equations (57a) may be also rewritten in the 
form of either one equation for a: 


(5.57b) 


(5.57c) 


The first-order harmonic balance equations (52) are evidently just the particular case of the reduced 
equations (57) for stationary oscillations (A = @ = 0).2! 


Superficially, the system (57a) of two coupled, first-order differential equations may look more 
complex than the initial, second-order differential equation (38), but actually, it is usually much simpler. 
For example, let us spell them out for the easy case of free oscillations a linear oscillator with damping. 
For that, we may reuse the ready Eq. (46) by taking a = fo = 0, and thus turning Eqs. (57a) into 


A= ao) sin = <2 en Ace + 260 Asin'P)sin ¥ = —6 A, (5.58a) 
a) a) 


p= —7° cos = — (Bw Acos +260 AsinP)cos¥ = €. (5.58b) 
o o 


The solution of Eq. (58a) gives us the same “envelope” law A(t) = A(0)e™ as the exact solution 
(10) of the initial differential equation, while the elementary integration of Eq. (58b) yields g(t) = ét + 
Q(0) = wt — Mt + GO). This means that our approximate solution, 


g(t) = A(t) cos[at — g(t)]= A(0)e~ % cos[at — o(0)], (5.59) 


agrees with the exact Eq. (9), and misses only the correction (8) of the oscillation frequency. (This 
correction is of the second order in 6, i.e. of the order of ¢, and hence is beyond the accuracy of our first 
approximation.) It is remarkable how nicely do the reduced equations recover the proper frequency of 
free oscillations in this autonomous system — in which the very notion of @ is ambiguous. 


21 One may ask why we cannot stick to just one, most compact, complex—amplitude form (57b) of the reduced 
equations. The main reason is that when the function /(g,q,¢) is nonlinear, we cannot replace its real arguments, 


such as q = Acos(a@t — ~), with their complex-function representations like aexp{-i@t} (as could be done in the 
linear problems considered in Sec. 5.1), and need to use real variables, such as either {A, g} or {u, v}, anyway. 
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The result is different at forced oscillations. For example, for the (generally, nonlinear) Duffing 
oscillator described by Eq. (43) with fo # 0, Eqs. (57a) yield the reduced equations, 


A=-64+L2sing, AQg= (A) A+ Zcoso, (5.60) 


which are valid for an arbitrary function ¢(A), provided that this nonlinear detuning remains much 
smaller than the oscillation frequency. Here (after a transient), the amplitude and phase tend to the 
stationary states described by Eqs. (47). This means that g becomes a constant, so that g° > Acos(at — 
const), i.e. the reduced equations again automatically recover the correct frequency of the solution, in 
this case equal to the external force frequency. 


Note that each stationary oscillation regime, with certain amplitude and phase, corresponds to a 
fixed point of the reduced equations, so that the stability of those fixed points determines that of the 
oscillations. In the next three sections, we will carry out such analyses for several simple systems of key 
importance for physics and engineering. 


5.4. Self-oscillations and phase locking 


B. van der Pol’s motivation for developing his method was the analysis of one more type of 
oscillatory motion: self-oscillations. Several systems, e.g., electronic rf amplifiers with positive 
feedback and optical media with quantum level population inversion, provide convenient means for the 
compensation and even over-compensation of the intrinsic energy losses in_ oscillators. 
Phenomenologically, this effect may be described as the change of sign of the damping coefficient 6 
from positive to negative. Since for small oscillations the equation of motion is still linear, we may use 
Eq. (9) to describe its general solution. This equation shows that at 0< 0, even infinitesimal deviations 
from equilibrium (say, due to unavoidable fluctuations) lead to oscillations with exponentially growing 
amplitude. Of course, in any real system such growth cannot persist infinitely, and shall be limited by 
this or that effect — e.g., in the above examples, respectively, by the amplifier’s saturation and the 
quantum level population’s exhaustion. 


In many cases, the amplitude limitation may be described reasonably well by making the 
following replacement: 


264 > 264 + fq’, (5.61) 


with £> 0. Let us analyze the effects of such nonlinear damping, applying the van der Pol’s approach?2 
to the corresponding differential equation: 


G+ 2697+ By? +a,q =9. (5.62) 


Carrying out the dissipative and detuning terms to the right-hand side, and taking them for f in the 
canonical Eq. (38), we can easily calculate the right-hand sides of the reduced equations (57a), getting? 


=-6(A)A, — where d(A)=6+ : Bo’ A’, (5.63a) 


22 In his original work, B. van der Pol considered a very similar equation (frequently called the van der Pol 
oscillator) that differs from Eq. (62) only by the nonlinear term: id > qq , and has very similar properties. 
23 For that, one needs to use the trigonometric identity sin?'Y = (3/4) sin'¥ — (1/4) sin3'¥ — see, e.g., MA Eq. (3.4). 


Chapter 5 Page 15 of 38 


Essential Graduate Physics CM: Classical Mechanics 


AG=EA. (5.63b) 


The last of these equations has exactly the same form as Eq. (58b) for the case of decaying 
oscillations and hence shows that the self-oscillations (if they happen, i.e. if A # 0) have the own 
frequency @ of the oscillator — cf. Eq. (59). However, Eq. (63a) is more substantive. If the initial 
damping 01s positive, it has only the trivial fixed point, Ao = 0 (that describes the oscillator at rest), but 
if Ois negative, there is also another fixed point, 

2 _{ 2|6| 
A, aE where q, = Ba 


1/2 
; for 6 <0, (5.64) 


which describes steady self-oscillations with a non-zero amplitude A). 


To understand which of these points is stable, let us apply the general approach discussed in Sec. 
3.2, the linearization of equations of motion, to Eq. (63a). For the trivial fixed point Ap = 0, its 
linearization is reduced to discarding the nonlinear term in the definition of the amplitude-dependent 
damping o(A). The resulting linear equation evidently shows that the system’s equilibrium point, A = Ao 
= 0, is stable at 6 > 0 and unstable at 6 < 0. (This self-excitation condition was already discussed 
above.) On the other hand, the linearization of near the non-trivial fixed point A; requires a bit more 


math: in the first order in A = A- A, > 0, we get 
A= A=-O(A, + A)-= po'(A +A) x -54-= por34i4 =(-5+36)A=264, (5.65) 


where Eq. (64) has been used to eliminate 4;. We see that the fixed point A; (and hence the self- 
oscillation process) is stable as soon as it exists (6 <0) — similarly to the situation in our “testbed 
problem” (Fig. 2.1), besides that in our current, dissipative system, the stability is “actual” rather than 
“orbital” — see Sec. 6 for more on this issue. 


Now let us consider another important problem: the effect of an external oscillating force on a 
self-excited oscillator. If the force is sufficiently small, its effects on the self-excitation condition and 
the oscillation amplitude are negligible. However, if the frequency w of such a weak force is close to the 
own frequency @ of the oscillator, it may lead to phase locking** — also called “synchronization”, 
though the latter term also has a much broader meaning. At this effect, the oscillation frequency deviates 
from @, and becomes exactly equal to the external force’s frequency @, within a certain range 


-A<sa@-@, <+A. (5.66) 


To prove this fact, and also to calculate the phase-locking range width 2A, we may repeat the 
calculation of the right-hand sides of the reduced equations (57a), adding the term focos@f to the right- 
hand side of Eq. (62) — cf. Eqs. (42)-(43). This addition modifies Eqs. (63) as follows:25 


=-6(A)A+ Pe sin Q, (5.67a) 


siakndets (5.67b) 
20 


24 Apparently, the phase locking was first noticed by the same C. Huygens for pendulum clocks. 
25 Actually, this result should be evident, even without calculations, from the comparison of Eqs. (60) and (63). 
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If the system is self-excited, and the external force is weak, its effect on the oscillation amplitude is 
small, and in the first approximation in fo we can take A to be constant and equal to the value 4; given 
by Eq. (64). Plugging this approximation into Eq. (67b), we get a very simple equation2° 


Phase 
locking 7) = é +Acos 2, (5.68) 
equation 


vhere in our current case 


A= fo (5.69) 
20A, 
Within the range —|A| < €< +|A|, Eq. (68) has two fixed points on each 27-segment of the variable ¢: 
P. = teos"{- £)+2m. (5.70) 


It is easy to linearize Eq. (68) near each point to analyze their stability in our usual way; 
however, let me use this case to demonstrate another convenient way to do this in 1D systems, using the 
so-called phase plane [ 9, @ | — see Fig. 5, where the red line shows the right-hand side of Eq. (68). 


Fig. 5.5. The phase plane of a phase- 
locked oscillator, for the particular 
case €=A/2, fo> 0. 


Since according to Eq. (68), positive values of the plotted function correspond to the growth of g 
in time and vice versa, we may draw the arrows showing the direction of phase evolution. From this 
graphics, it is clear that one of these fixed points (for fo > 0, @-) is stable, while its counterpart (in this 
case, @_) is unstable. Hence the magnitude of A given by Eq. (69) is indeed the phase-locking range (or 
rather its half) that we wanted to find. Note that the range is proportional to the amplitude of the phase- 
locking signal — perhaps the most important quantitative feature of this effect. 


To complete our simple analysis, based on the assumption of fixed oscillation amplitude, we 
need to find the condition of its validity. For that, we may linearize Eq. (67a), for the stationary case, 
near the value A, just as we have done in Eq. (65) for the transient process. The stationary result, 


oe: 

2|6| 2 
shows that our assumption, |A | << Ai, and hence the final result (69), are valid if the calculated phase- 
locking range 2A is much smaller than 4|6}. 


Al. 
sin Q,,, (5.71) 


A=A-A, 


sing, ~ A, 


26 This equation is ubiquitous in phase-locking system descriptions, including even some digital electronic circuits 
used for that purpose — at the proper re-definition of the phase difference ¢. 
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5.5. Parametric excitation 


In both problems solved in the last section, the stability analysis was easy because it could be 
carried out for just one slow variable, either amplitude or phase. More generally, such an analysis of the 
reduced equations involves both of these variables. A classical example of such a situation is provided 
by one important physical effect — the parametric excitation of oscillations. A simple example of such 
excitation is given by a pendulum with a variable parameter, for example, the suspension length /(t) — 
see Fig. 6. Experiments?” and numerical simulations show that if the length is changed periodically 
(modulated) with some frequency 2@ that is close to 2@, and a sufficiently large depth Al, the 
equilibrium position of the pendulum becomes unstable, and it starts oscillating with frequency @ equal 
exactly to the half of the modulation frequency — and hence only approximately equal to the average 
frequency @ of the oscillator. 


20 | Al 


NS 


i(t) 


Fig. 5.6. Parametric excitation of a pendulum. 


For an elementary analysis of this effect, we may consider the simplest case when the 
oscillations are small. At the lowest point (9 = 0), where the pendulum moves with the highest velocity 
Vmax, the suspension string’s tension ¥ is higher than mg by the centripetal force: W max = mg + MVmax/L. 
On the contrary, at the maximum deviation of the pendulum from the equilibrium, the force is Jower 
than mg, because of the string’s tilt: Trin = mgcos@nax. Using the energy conservation, E = mVmax’/2 = 
mgl(1 — coS@nax), We may express these values as Snax = mg + 2E/] and Snin = mg — E/l. Now, if during 
each oscillation period the string is pulled up slightly by A/ (with | A/ | << /) at each of its two passages 
through the lowest point, and is let to go down by the same amount at each of two points of the 
maximum deviation, the net work of the external force per period is positive: 


Al 


Wr 2G ~ Trin VOI OE, (5.72) 


and hence increases the oscillator’s energy. If the parameter modulation depth A/ is sufficient, this 
increase may overcompensate the energy drained out by damping during the same period. 
Quantitatively, Eq. (10) shows that low damping (0 << @p) leads to the following energy decrease, 


AE = ae (5.73) 
MD 
per oscillation period. Comparing Eqs. (72) and (73), we see that the net energy flow into the 
oscillations is positive, 7/+ AE > 0, 1.e. oscillation amplitude has to grow if?8 


27 The simplest experiments of this kind may be done with the usual playground swings, where moving your body 
up and down moves the system’s c.o.m. position, and hence the effective length /,; of the support — see Eq. (4.41). 
28 Modulation of the pendulum’s mass (say, by periodic pumping water in and out of a suspended bottle) gives a 
qualitatively similar result. Note, however, that parametric oscillations cannot be excited by modulating every 
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Al 278 _ a 
1 3a, 30 


(5.74) 


Since this result is independent of the oscillation energy F, the growth of energy and amplitude is 
exponential (until E becomes so large that some of our assumptions fail), so that Eq. (74) is the 
condition of parametric excitation — in this simple model. 


However, this result does not account for a possible difference between the oscillation frequency 
@ and the eigenfrequency @p, and also does not clarify whether the best phase shift between the 
oscillations and parameter modulation, assumed in the above calculation, may be sustained 
automatically. To address these issues, we may apply the van der Pol approach to a simple but 
reasonable model: 
G + 269 + @ (1+ ucos2at)q = 0, (5.75) 


describing the parametric excitation in a linear oscillator with a sinusoidal modulation of the parameter 
@ (t). Rewriting this equation in the canonical form (38), 


Gt+aq=f(t,q,g) =—209 + 2€aq — Ua, q cos 2at, (5.76) 


and assuming that the dimensionless ratios d/@ and | & |/@, and the modulation depth yw are all much less 
than 1, we may use general Eqs. (57a) to get the following reduced equations: 


A= -~64-” Asin 29, 
: (5.77) 
AQ= Ag - 7™ Acos2o. 


These equations evidently have a fixed point, with Ap = 0, but its stability analysis (though 
possible) is not absolutely straightforward, because the phase g of oscillations is undetermined at that 
point. In order to avoid this (technical rather than conceptual) difficulty, we may use, instead of the real 
amplitude and phase of oscillations, either their complex amplitude a = A exp {ig}, or its components u 
and v — see Eqs. (4). Indeed, for our function f, Eq. (57b) gives 


a =(-6+ig)a-i a", (5.78) 


while Eqs. (57c) yield 


i =-Su-& rae 


b= Or bu. 


(5.79) 


We see that in contrast to Eqs. (77), in the “Cartesian coordinates” {u, v} the trivial fixed point 
Ag = 0 (i.e. uo = Vo = 0) is absolutely regular. Moreover, equations (78)-(79) are already linear, so they 
do not require any additional linearization. Thus we may use the same approach as was already used in 
Secs. 3.2 and 5.1, i.e. look for the solution of Eqs. (79) in the exponential form exp {At}. However, now 


oscillator’s parameter — for example, the oscillator’s damping coefficient (at least if it stays positive at all times), 
because this does not change the system’s energy, just the energy drain rate. 
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we are dealing with two variables and should allow them to have, for each value of 4, a certain ratio u/v. 
For that, we may take the partial solution in the form 


u=C eft vace. (5.80) 


where the constants c, and c, are frequently called the distribution coefficients. Plugging this solution 
into Eqs. (79), we get from them the following system of two linear algebraic equations: 


(-6-A)e, +|-¢-42 Jo =0, 


(5.81) 
(+ -H2\,, +(-5-A)c, =0. 
The characteristic equation of this system, i.e. the condition of compatibility of Eqs. (81), 
ee ae eee ae 
. 4 = #4261467 46° (42 =0, (5.82) 
a 4 


4 
has two roots: 


Ries se |() Z| (5.83) 


Requiring the fixed point to be unstable, Re/, > 0, we get the parametric excitation condition 
ese) (5.84) 


Thus the parametric excitation may indeed happen without any external phase control: the arising 
oscillations self-adjust their phase to pick up energy from the external source responsible for the 
periodic parameter variation. 


Our key result (84) may be compared with two other calculations. First, in the case of negligible 
damping (6 = 0), Eq. (84) turns into the condition wa/4 > | €| . This result may be compared with the 
well-developed theory of the so-called Mathieu equation, whose canonical form is 


2 


LY (6 db ep y=0. (5.85) 


Zz 
Vv 


With the substitutions y > g, v > at, a > (@/@)’, and b/a > —p/2, this equation is just a particular case 
of Eq. (75) for 6= 0. In terms of Eq. (85), our result (84) may be re-written just as b > la—1], and is 
supposed to be valid for b << 1. The boundaries given by this condition are shown with dashed lines in 
Fig. 7 together with the numerically calculated2° stability boundaries for the Mathieu equation. One can 
see that the van der Pol approximation works just fine within its applicability limit (and a bit beyond :-), 
though it fails to predict some other important features of the Mathieu equation, such as the existence of 
higher, more narrow regions of parametric excitation (at a =n’, i.e. @ © o/n, for all integer n), and some 


29 Such calculations are substantially simplified by the use of the so-called Floquet theorem, which is also the 
mathematical basis for the discussion of wave propagation in periodic media — see the next chapter. 
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spill-over of the stability region into the lower half-plane a < 0.39 The reason for these failures is the fact 
that, as can be seen in Fig. 7, these phenomena do not appear in the first approximation in the parameter 
modulation amplitude yo b, which is the realm of the reduced equations (79). 


Fig. 5.7. Stability boundaries of the Mathieu 
equation (85), as calculated: numerically (solid 
curves) and using the reduced equations (79) 
(dashed straight lines). In the regions numbered 
by various n, the trivial solution y = 0 of the 
equation is unstable, i.e. its general solution y(v) 
includes an exponentially growing term. 


In the opposite case of non-zero damping but exact tuning (€=0, @ @p), Eq. (84) becomes 


>— =. 5.86 

a 9 ee 
This condition may be compared with Eq. (74) by taking A/// = 242. The comparison shows that while the 
structure of these conditions is similar, the numerical coefficients are different by a factor close to 2. 
The first reason for this difference is that the instant parameter change at optimal moments of time is 
more efficient than the smooth, sinusoidal variation described by (75). Even more significantly, the 
change of the pendulum’s length modulates not only its frequency @ = (g/1)'” as Eq. (75) implies but 
also its mechanical impedance Z = (gl) — the notion to be discussed in detail in the next chapter. (The 
analysis of the general case of the simultaneous modulation of @ and Z is left for the reader’s exercise.) 


To conclude this section, let me summarize the important differences between the excitation of 
the parametric and forced oscillations: 


(i) Parametric oscillations completely disappear outside of their excitation range, while the 
forced oscillations have a non-zero amplitude for any frequency and amplitude of the external force — 
see Eq. (18). 


(11) While the parametric excitation may be described by linear equations such as Eq. (75), such 
equations cannot predict a finite oscillation amplitude within the excitation range, even at finite 


30 This region (for b << 1, — b’/2 < a < 0) describes, in particular, the counter-intuitive stability of the so-called 
Kapitza pendulum — an inverted pendulum with the suspension point oscillated fast in the vertical direction — the 
effect first observed by Andrew Stephenson in 1908. 
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damping. In order to describe stationary parametric oscillations, some nonlinear effects have to be taken 
into account. (I am leaving analyses of such effects for the reader’s exercise.) 


One more important feature of parametric oscillations will be discussed in the next section. 


5.6. Fixed point classification 


The reduced equations (79) give us a good pretext for a brief discussion of an important general 
topic of dynamics: classification and stability of the fixed points of a system described by two time- 
independent, first-order differential equations with time-independent coefficients.3! After their 
linearization near a fixed point, the equations for deviations can always be expressed in the form similar 
to Eq. (79): 

q =M).9¢,+M4q), (5.87) 
92 =My9,+M pq), 


where M; (with j, 7’ = 1, 2) are some real scalars, which may be viewed as the elements of a 2x2 matrix 
M. Looking for an exponential solution of the type (80), 


~ At ~ At 
GQ =e", G,=c,e'; (5.88) 
we get a general system of two linear equations for the distribution coefficients cj: 


(M,, -A)c, + Mc, =0, 


(5.89) 
Myc, +(M,, —A)c, =9. 
These equations are consistent if 
M,,- A M),, a 
=0, (5.90) 
M,, M,, A 
giving us a quadratic characteristic equation: 
V — AM, +M,,)+(M,,M » -M,,M,,)=09. (5.91) 
Its solution,32 
1 1 
Ag = (My, +My) £5 [My Mn)? +4M My], (5.92) 


shows that the following situations are possible: 


A. The expression under the square root, (Mi1- M2)” + 4Mi2Mn1, is positive. In this case, both 
characteristic exponents A, are real, and we can distinguish three sub-cases: 


31 Autonomous systems described by a single, second-order homogeneous differential equation, 
say F'(q,q,q) = 0, also belong to this class, because we may always treat the generalized velocity g =v as anew 


variable, and use this definition as one first-order differential equation, while the initial equation, in the 
form F'(q,v,v) = 0, as the second first-order equation. 


32 In the language of linear algebra, 2, are the eigenvalues, and the corresponding sets of the distribution 
coefficients [c), c2]+ are the eigenvectors of the matrix M with elements M;,. 
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(i) Both A, and A. are negative. As Eqs. (88) show, in this case the deviationsg tend to 
zero at t > ©, 1.e. the fixed point is stable. Because of generally different magnitudes of the exponents 
A, the process represented on the phase plane [¢,,q,] (see Fig. 8a, with the solid arrows, for an 
example) may be seen as consisting of two stages: first, a faster (with the rate |A.| > |A,|) relaxation to a 


linear asymptote,*> and then a slower decline, with the rate |A,|, along this line, i.e. at a virtually fixed 
ratio of the variables. Such a fixed point is called the stable node. 


(a) (b) 


unstable , 


asymptote 
y ymp 


/ 
separatrix ——> 


ymptote 


Fig. 5.8. Typical trajectories on the phase plane [ 9,7, ] near fixed points of different types: 
(a) node, (b) saddle, (c) focus, and (d) center. The particular matrices M used for the first 
three panels correspond to Eqs. (81) for the parametric excitation, with € = 6 and three 
different values of the ratio 41@/46: (a) 1.25, (b) 1.6, and (c) 0. 


33 The asymptote direction may be found by plugging the value 2, back into Eq. (89) and finding the 
corresponding ratio c,/c>. Note that the separation of the system’s evolution into the two stages is conditional, 
being most vivid in the case of a large difference between the exponents A, and 2. 
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(ii) Both A, and 2. are positive. This case of an unstable node differs from the previous 
one only by the direction of motion along the phase plane trajectories — see the dashed arrows in Fig. 8a. 
Here the variable ratio is also approaching a constant soon, now the one corresponding to 2, > 1. 


(iii) Finally, in the case of a saddle (A; > 0, A. < 0), the system’s dynamics is different 
(Fig. 8b): after the rate A.| relaxation to an asymptote, the perturbation starts to grow, with the rate A., 
along one of two opposite directions. (The direction is determined on which side of another straight line, 
called the separatrix, the system has been initially.) So the saddle* is an unstable fixed point. 


B. The expression under the square root in Eq. (92), (Mj1- Myy + 4 M\.Mp), is negative. In this 
case, the square root is imaginary, making the real parts of both roots equal, Rez = (Mi; + M22)/2, and 
their imaginary parts equal but opposite. As a result, here there can be just two types of fixed points: 


(i) Stable focus, at (Mj), + M2) < 0. The phase plane trajectories are spirals going to the 
origin (i.e. toward the fixed point) — see Fig. 8c with the solid arrow. 


(ii) Unstable focus, taking place at (Mi; + M2) > 0, differs from the stable one only by 
the direction of motion along the phase trajectories — see the dashed arrow in the same Fig. 8c. 


C. Frequently, the border case, Mj; + M2. = 0, corresponding to the orbital (‘indifferent’) 
stability already discussed in Sec. 3.2, is also distinguished, and the corresponding fixed point is referred 
to as the center (Fig. 8d). Considering centers as a separate category makes sense because such fixed 
points are typical for Hamiltonian systems, whose first integral of motion may be frequently represented 
as the distance of the representing point from a certain center. For example, introducing new variables 
G, =q and q, = mq, we may rewrite Eq. (3.12) of a harmonic oscillator without dissipation (again, 
with indices “ef” dropped for brevity), as a system of two first-order differential equations: 


wn | PO) nw ~ 
q, =~ > q, = —kq), (5.93) 
m 


i.e. as a particular case of Eq. (87), with My, = Mo: = 0, and Mj2Mo; =—K/m = —a’ < 0, and hence (M11- 
My) + 4My2M>, = 4a" < 0, and M,; + M2 = 0. On the symmetrized phase plane CAVE /Z), where 


1/2 


the parameter Z = (xm) = m@p is the oscillator’s impedance, the sinusoidal oscillations of amplitude A 


are represented by a circle of radius A about the center-type fixed point A = 0. In the case when g, =q 


is the linear coordinate g of an actual mechanical oscillator, so that g, = mq, is its linear momentum 
p = mq, such a circular trajectory corresponds to the conservation of the oscillator’s energy 


2 2 ~ \2 2 
E=T+Us2 4M -*\a (8) |p coms (5.94) 


2m 2 2 Zz 


This is a convenient moment for a brief discussion of the so-called Poincaré (or “slow-variable”’, 
or “stroboscopic’’) plane.*> From the point of view of the basic Eq. (41), the sinusoidal oscillations g(f) 


34 The term “saddle” is due to the fact that in this case, the system’s dynamics is qualitatively similar to that of a 
heavily damped motion in a 2D potential U(q,,q, ) having the shape of a horse saddle (or a mountain pass). 


35 Named after Jules Henri Poincaré (1854-1912), who is credited, among many other achievements in physics 
and mathematics, for his contributions to special relativity (see, e.g., EM Chapter 9), and the basic idea of 
unstable trajectories responsible for the deterministic chaos — to be discussed in Chapter 9 of this course. 
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= Acos(at — @), described by a circular trajectory on the actual (symmetrized) phase plane, correspond 
to a fixed point {A, g}, which may be conveniently represented by a stationary geometric point on the 
plane with these polar coordinates — see Fig. 9a. (As follows from Eq. (4), the Cartesian coordinates of 
the point on that plane are just the variables u = Acosg and v = Acos@g that were used, in particular, in the 
last section.) The quasi-sinusoidal process (41), with slowly changing A and g, may be represented by 
slow motion of that point on this Poincaré plane. 


Fig. 5.9. (a) Representation of a 
sinusoidal oscillation (point) and a 
slow transient process (line) on the 
Poincaré plane, and (b) the relation 
between the “fast” phase plane and the 
“slow” (Poincaré) plane. 


Figure 9b shows a convenient way to visualize the relation between the actual phase plane of an 
oscillator, with the “fast” symmetrized coordinates g and p/ma, and the Poincaré plane with the “slow” 
coordinates u and v: the latter plane rotates relative to the former one, about the origin, clockwise, with 
the angular velocity w.° Another, “stroboscopic” way to generate the Poincaré plane pattern is to have a 
fast glance at the “real” phase plane just once during the oscillation period 7 = 27/a. 


In many cases, the representation on the Poincaré plane is more convenient than that on the 
“real” phase plane. In particular, we have already seen that the reduced equations for such important 
phenomena as the phase locking and the parametric oscillations, whose original differential equations 
include time explicitly, are time-independent — cf., e.g., (75) and (79) describing the latter effect. This 
simplification brings the equations into the category considered earlier in this section, and enables an 
easy Classification of their fixed points, which may shed additional light on their dynamic properties. 


In particular, Fig. 10 shows the classification of the only (trivial) fixed point 4; = 0 on the 
Poincaré plane of the parametric oscillator, which follows from Eq. (83). As the parameter modulation 
depth w is increased, the type of this fixed point changes from a stable focus (pertinent to a simple 
oscillator with damping) to a stable node and then to a saddle describing the parametric excitation. In 
the last case, the two directions of the perturbation growth, so prominently featured in Fig. 8b, 
correspond to the two possible values of the oscillation phase g, with the phase choice determined by 
initial conditions. 


This double degeneracy of the parametric oscillation’s phase could already be noticed from Eqs. 
(77), because they are evidently invariant with respect to the replacement g + g+ z. Moreover, the 
degeneracy is not an artifact of the van der Pol approximation, because the initial equation (75) is 
already invariant with respect to the corresponding replacement g(t) > q(t-— z/@). This invariance 


36 This notion of phase plane rotation is the origin of the term “Rotating Wave Approximation”, mentioned above. 
(The word “wave” is an artifact of this method’s wide application in classical and quantum optics.) 
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means that all other characteristics (including the amplitude) of the parametric oscillations excited with 
either of the two phases are exactly similar. At the dawn of the computer age (in the late 1950s and early 
1960s), there were substantial attempts, especially in Japan, to use this property for storage and 
processing digital information coded in the binary-phase form. Though these attempts have not survived 
the competition with simpler approaches based on binary-voltage coding, some current trends in the 
development of prospective reversible and quantum computers may be traced back to that idea. 


Lo 
a | stable 


nodes 


bisa 
4 é | saddles 


stable 
focuses 


stable 


focuses Fig. 5.10. Types of the trivial fixed 


point of a parametric oscillator. 
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5.7. Numerical approaches 


If the amplitude of oscillations, for whatever reason, becomes so large that nonlinear terms in the 
equation describing an oscillator become comparable with its linear terms, numerical methods are 
virtually the only avenue available for their theoretical studies. In Hamiltonian 1D systems, such 
methods may be applied directly to Eq. (3.26), but dissipative and/or parametric systems typically lack 
such first integrals of motion, so that the initial differential equation has to be solved. 


Let us discuss the general idea of such methods on the example of what mathematicians call the 
Cauchy problem (finding the solution for all moments of time, starting from the known initial 
conditions) for the first-order differential equation 

q=f(t9). (5.95) 


(The generalization to a system of several such equations is straightforward.) Breaking the time axis into 
small equal steps h (Fig. 11) we can reduce the equation integration problem to finding the function’s 
value at the next time point, gn+1 = G(tn+1) = G(tn + h) from the previously found value gy = q(t,) — and, if 
necessary, the values of g at other previous time steps. 

Fins 


qn 


Fig. 5.11. The basic notions used at numerical 
integration of ordinary differential equations. 


n n+l t 


In the simplest approach (called the Euler method), gn+1 is found using the following formula: 


Dns = qn + k, 


5.96 
a: oe) 
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This approximation is equivalent to the replacement of the genuine function qg(f), on the segment [t,,, 
tn+1], with the two first terms of its Taylor expansion in point f¢,: 


q(t, +h) = q(t, ) + qt, A = q(t, ) + Mf En» In)- (5.97) 


This approximation has an error proportional to h’. One could argue that by making the step h 
sufficiently small, the Euler method’s error might be made arbitrarily small, but even with all the 
number-crunching power of modern computer platforms, the CPU time necessary to reach sufficient 
accuracy may be too large for big problems.3” Besides that, the increase of the number of time steps, 
which is necessary at h — 0 at a fixed total time interval, increases the total rounding errors and 
eventually may cause an increase, rather than the reduction of the overall error of the computed result. 


A more efficient way is to modify Eq. (96) to include the terms of the second order in h. There 
are several ways to do this, for example using the 2""-order Runge-Kutta method: 


Gn = qn + k,, 


k 


5.98 
hehslt +34, +5} k, =h f(t..4,)- 


2 
One can readily check that this method gives the exact result if the function g(¢) is a quadratic 
polynomial, and hence in the general case its errors are of the order of h°. We see that the main idea here 
is to first break the segment [¢,, ¢,11] in half (see Fig. 11 again), evaluate the right-hand side of the 
differential equation (95) at the point intermediate (in both ¢ and q) between the points number n and (n 
+ 1), and then use this information to evaluate gn+1. 


The advantage of the Runge-Kutta approach over other second-order methods is that it may be 
readily extended to the 4" order, without an additional breakup of the interval [t,, tr+1]: 


Gna =n 420; +2k, +2k, +k,), 
° (5.99) 


ky ahf(t, +hq, +k3). ks aac + nd ~ } k,= iG = od + 4 ky =hf(ti4n)- 
This method has a much lower error, O(h°), without being not too cumbersome. These features have 
made the 4""-order Runge-Kutta the default method in most numerical libraries. Its extension to higher 
orders is possible, but requires more complex formulas, and is justified only for some special cases, e.g., 
very abrupt functions g(¢).28 The most frequent enhancement of the method is an automatic adjustment 
of the step / to reach the pre-specified accuracy, but not make more calculations than necessary. 


Figure 12 shows a typical example of an application of that method to the very simple problem 
of a damped linear oscillator, for two values of fixed time step / (expressed in terms of the number N of 
such steps per oscillation period). The black straight lines connect the adjacent points obtained by the 


37 In addition, the Euler method is not time-reversible — the handicap that may be essential for the integration of 
Hamiltonian systems described by systems of second-order differential equations. However, this drawback may 
be partly overcome by the so-called /eapfrogging — the overlap of time steps / for a generalized coordinate and 
the corresponding generalized velocity. 

38 The most popular approaches in such cases are the Richardson extrapolation, the Bulirsch-Stoer algorithm, and 
a set of so-called prediction-correction techniques, e.g. the Adams-Bashforth-Moulton method — see the literature 
recommended in MA Sec. 16(iii). 
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4""-order Runge-Kutta method, while the points connected with the green straight lines represent the 
exact analytical solution (22). The plots show that a-few-percent errors start to appear only at as few as 
~10 time steps per period, so that the method is indeed very efficient. 


Let me hope that the discussion in the next section will make the conveniences and the handicaps 
of the numerical approach to problems of nonlinear dynamics very clear. 


(b) 


Oot Oot 


Fig. 5.12. Results of the Runge-Kutta solution of Eq. (6) (with 6/@ = 0.03) for: (a) 30 and (b) 6 points per 
oscillation period. The results are shown by points; the black and green lines are only the guides for the eye. 


5.8. Higher-harmonic and subharmonic oscillations 


Figure 13 shows the numerically calculated*? transient process and stationary oscillations in a 
linear oscillator and a very representative nonlinear system, the pendulum described by Eq. (42), both 
with the same @. Both systems are driven by a sinusoidal external force of the same amplitude and 
frequency — in this illustration, equal to the own small-oscillation frequency @ of both systems. The 
plots show that despite a very substantial amplitude of the pendulum oscillations (the angle amplitude of 
about one radian), their waveform remains almost exactly sinusoidal.4° On the other hand, the 
nonlinearity affects the oscillation amplitude very substantially. These results imply that the 
corresponding reduced equations (60), which are based on the assumption (41), may work very well far 
beyond its formal restriction | q | << 1. 


Still, the waveform of oscillations in a nonlinear system always differs from that of the applied 
force — in our case, from the sine function of frequency @. This fact is frequently formulated as the 
generation, by the system, of higher harmonics. Indeed, the Fourier theorem tells us that any non- 
sinusoidal periodic function of time may be represented as a sum of its basic harmonic of frequency @, 
plus higher harmonics with frequencies na, with integer n > 1. 


Note that an effective generation of higher harmonics is only possible with adequate nonlinearity 
of the system. For example, consider the nonlinear term ag’ used in the equations explored in Secs. 2 


39 All numerical results shown in this section have been obtained by the 4'"-order Runge-Kutta method with the 
automatic step adjustment that guarantees the relative error of the order of 10“ — much smaller than the pixel size 
in the shown plots. 

40 In this particular case, the higher harmonic content is about 0.5%, dominated by the 3™ harmonic, whose 
amplitude and phase are in a very good agreement with Eq. (50). 
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and 3. If the waveform g(f) is sinusoidal, such term will have only the basic (1“) and the 3" harmonics — 
see, e.g., Eq. (50). As another example, the “pendulum nonlinearity” sing cannot produce, without a 
time-independent component (bias”) in q(t), any even harmonic, including the 2™ one. The most 
efficient generation of harmonics may be achieved using systems with the sharpest nonlinearities — e.g., 
semiconductor diodes whose current may follow an exponential dependence on the applied voltage 
through several orders of magnitude. 4! 


2 2 
1 1 
q(t) 
0 0 
-| = 
-2 =2 
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2 2 
1 1 
q(t) 
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-| ma | 
-2 = 
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Fig. 5.13. The oscillations induced by a similar sinusoidal external force (turned on at ¢ = 0) in two 
systems with the same small-oscillation frequency w» and low damping: a linear oscillator (two 
top panels) and a pendulum (two bottom panels). In all cases, d/@o = 0.03, fo = 0.1, and w = wp. 


Another way to increase the contents of an n" higher harmonic in a nonlinear oscillator is to 
reduce the excitation frequency @ to ~@p/n, so that the oscillator resonated at the frequency n@ ~ @ of 
the desired harmonic. For example, Fig. 14a shows the oscillations in a pendulum described by the same 
Eq. (42), but driven at frequency @ = @/3. One can see that the 3 harmonic amplitude may be 
comparable with that of the basic harmonic, especially if the external frequency is additionally lowered 
(Fig. 14b) to accommodate for the deviation of the effective frequency wo(A) of own oscillations from 
its small-oscillation value wo— see Eq. (49), Fig. 4, and their discussion in Sec. 2 above. 


However, numerical modeling of nonlinear oscillators, as well as experiments with their physical 
implementations, bring more surprises. For example, the bottom panels of Fig. 15 show oscillations in a 
pendulum under the effect of a strong sinusoidal force with a frequency @ close to 3@. One can see that 
at some parameter values and initial conditions, the system’s oscillation spectrum is heavily contributed 
(almost dominated) by the 3" subharmonic, i.e. the Fourier component of frequency @/3 = @. 


This counter-intuitive phenomenon of such subharmonic generation may be explained as 
follows. Let us assume that subharmonic oscillations of frequency @/3 ~ @ have somehow appeared, 
and coexist with the forced oscillations of frequency 3a: 


41 This method is used in practice, for example, for the generation of electromagnetic waves with frequencies in 
the terahertz range (10'7-10'* Hz), which is still in wait for efficient electronic self-oscillators. 
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q(t)  Acos'¥+ A,, cos? ..,, where P=at-—9, Y,. = = Os (5.100) 


3 


Then the leading nonlinear term, aq’, of the Taylor expansion of the pendulum’s nonlinearity sin q, is 
proportional to 


q =(Acos¥ + A,,,, cos Yr 


(5.101) 
= A’ cos’ ¥ +347 A,,, cos* ¥cos¥,,, +3447, cos cos? ¥., + 42, cos’ P.,. 


sub su sub sul 
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Fig. 5.14. The oscillations induced in a pendulum, with damping 6/q@ » = 0.03, by a sinusoidal external 
force of amplitude fo = 0.75, and frequencies wo/3 (top panel) and 0.8xqa@o0/3 (bottom panel). 
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Fig. 5.15. The oscillations of a pendulum with 6/@y = 0.03, driven by a sinusoidal external force of 
amplitude fo = 3 and frequency 0.8x3q@p, at initial conditions g(0) = 0 (the top panels) and q(0) = 1 
(the bottom panels), with dq/dt (0) = 0 in both cases. 
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While the first and the last terms of the last expression depend only on the amplitudes of the 
individual components of oscillations, the two middle terms are more interesting, because they produce 
so-called combinational frequencies of the two components. For our case, the third term, 
cos(P —2'P .)+---5 (5.102) 


sub 


3A A’, cos¥cos* P., = = AA: 


is of special importance, because it produces (besides other combinational frequencies) the subharmonic 
component with the total phase 

y-2Y = 9+ 20.4. (5.103) 
Thus, this nonlinear contribution is synchronous with the subharmonic oscillations, and describes the 
interaction that can, within a certain range of the mutual phase shift between the Fourier components, 
deliver to them energy from the external force, so that the oscillations may be sustained. Note, however, 
that the amplitude of the term describing this energy exchange is proportional to the square of Agup, and 
vanishes at the linearization of the equations of motion near the trivial fixed point. This means that the 
point is always stable, i.e., the 3" subharmonic cannot be self-excited and always needs an initial “kick- 
off’ — compare the two panels of Fig. 15. The same is true for higher-order subharmonics. 


Only the second subharmonic is a special case. Indeed, let us make a calculation similar to Eq. 
(102), by replacing Eq. (101) with 
at 


where ¥ = ot — P, Y cup Syn Pup ? (5. 104) 


q(t) » Acos¥ + A,,, cos? ..,, 5 


for a nonlinear term proportional to q”: 


q =(Acos¥ + A,, cos¥.,,)? = A’ cos? ¥+2AA,,, cos¥cos¥,,, + Ao, cos’. (5.105) 


sub sul 


Here the combinational-frequency term capable of supporting the 2" subharmonic, 
2AA,,, cos ¥ cos? ., = AA,,, cos(¥ een ) = AA,» cos(at — D+ Qo )+ ibd (5.106) 
is linear in the subharmonic’s amplitude, i.e. survives the linearization near the trivial fixed point. This 


means that the second subharmonic may arise spontaneously, from infinitesimal fluctuations. 


Moreover, such excitation of the second subharmonic is very similar to the parametric excitation 
that was discussed in detail in Sec. 5, and this similarity is not coincidental. Indeed, let us redo the 
expansion (106) making a somewhat different assumption — that the oscillations are a sum of the forced 
oscillations at the external force’s frequency wand an arbitrary but weak perturbation: 


q(t) = Acos(at—9)+q(t), with |g|<< A. (5.107) 
Then, neglecting the small term proportional to 7”, we get 
q ~ A’ cos’ (at — py) + 2g(t)Acos(at — 9). (5.108) 


Besides the inconsequential phase shift g, the second term in the last formula is exactly similar to the 
term describing the parametric effects in Eq. (75). This fact means that for a weak perturbation, a system 
with a quadratic nonlinearity in the presence of a strong “pumping” signal of frequency @ is equivalent 
to a system with parameters changing in time with frequency w. This fact is broadly used for the 
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parametric excitation at high (e.g., optical) frequencies, where the mechanical means of parameter 
modulation (see, e.g., Fig. 5) are not practicable. The necessary quadratic nonlinearity at optical 
frequencies may be provided by a non-centrosymmetric nonlinear crystal, e.g., the /-phase barium 
borate (BaB20O,). 


Before finishing this section, let me elaborate a bit on a general topic: the relation between the 
numerical and analytical approaches to problems of dynamics — and to physics as a whole. We have just 
seen that sometimes numerical solutions, like those shown in Fig. 15b, may give vital clues for 
previously unanticipated phenomena such as the excitation of subharmonics. (The phenomenon of 
deterministic chaos, which will be discussed in Chapter 9 below, presents another example of such 
“numerical discoveries”.) One might also argue that for problems without exact analytical solutions, the 
numerical simulation may be an equally productive theoretical tool. These hopes are, however, muted by 
the general problem that is frequently called the curse of dimensionality,” in which the last word refers 
to the number of parameters of the problem to be solved.#? 


Indeed, let us have one more look at Fig. 15. OK, we have been lucky to find a new 
phenomenon, the 3“ subharmonic generation, for a particular set of parameters — in that case, five of 
them: 6/@ = 0.03, @/@ = 2.4, fo = 3, q(0) = 1, and dq/dt (0) = 0. Could we tell anything about how 
common this effect is? Are subharmonics with different n possible in this system? The only way to 
address these questions computationally is to carry out similar numerical simulations at many points of 
the d-dimensional (in this case, d = 5) space of parameters. Say, we have decided that breaking the 
reasonable range of each parameter to N = 100 points is sufficient. (For many problems, even more 
points are necessary — see, e.g., Sec. 9.1.) Then the total number of numerical experiments to carry out is 
N*“ = (10°)? = 10'° — not a simple task even for the powerful modern computing facilities. (Besides the 
pure number of required CPU cycles, consider the storage and analysis of the results.) For many 
important problems of nonlinear dynamics, e.g., turbulence, the parameter dimensionality d is 
substantially larger, and the computer resources necessary even for one numerical experiment, are much 
greater. 


In view of the curse of dimensionality, approximate analytical considerations, like those outlined 
above for the subharmonic excitation, are invaluable. More generally, physics used to stand on two legs: 
experiment and analytical theory. The enormous progress of computer performance during the few last 
decades has provided it with one more support point (a tail? :-) — numerical simulation. This does not 
mean we can afford to discard any of the legs we are standing on. 


5.9. Relaxation oscillations 


Such synthesis of the analytical and numerical approaches is also beneficial for the discussion of 
the last subject of this chapter: nonlinear oscillators with high damping. Perhaps the most interesting 
effect in such systems is the so-called relaxation oscillations, a type of self-oscillations with highly non- 
sinusoidal waveforms. Let me demonstrate them using our old friend, Eq. (62) with 6 < 0, whose 


42 This term had been coined in 1957 by Richard Bellman in the context of the optimal control theory (where the 
dimensionality means the number of parameters affecting the system under control), but gradually has spread all 
over quantitative sciences using numerical methods. 

43 In EM Sec. 1.2, I discuss the implications of this “curse” for a different case, when both analytical and 
numerical solutions to the same problem are possible. 
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properties at | 6| << @ were discussed 
in Sec. 4, because it will enable us to 
follow the crossover from the harmonic 
oscillations to the relaxation ones. 


Figure 16 shows the results of the 
numerical solution of this equation for 
three characteristic values of its only 
substantial parameter 


2|6| 
j=— > 0. 
Mo 


(5.109) 


(Indeed, if we introduce the natural 
dimensionless time t= pt, displacement 
x = q/qo, where qo is the scale defined in 
Eq. (64), and velocity y = dx/dr, then the 
second-order differential equation (62) 
may be rewritten as the following system 
of two first-order equations: * 


ae 

(5.110) 
a4 2 

Yai wor. 

pee a 


with Z% being its only parameter.) The 
left panels show phase planes [x, dx/dt] 
of the oscillator, with their axes 
swapped*® for the comparison with the 
right panels showing the displacement x 
as a function of time. 


If the damping is low (top two 
panels), the system, launched from any 
initial state, gradually approaches the 
“limit cycle’ of nearly-harmonic 
oscillations. Note that even for this, not 
extremely small value 2=0.2, 
deviations of the waveform x(z) from a 
purely sinusoidal function of time are 
very small, its period (in the normalized 
time 7) is very close to the small- 
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Fig. 5.16. The phase plane and time evolution of the self- 
oscillator described by Eqs. (62) and (110), for three values of the 
normalized damping (109). The red and blue lines show the 
system’s dynamics for two representative initial conditions, while 
the black lines, its asymptotic behavior (the “limit cycles”). 


6 8 


44 As Eq. (11) shows, for positive damping, this parameter is just the reciprocal Q-factor. 

45 A somewhat different equation used in 1926 by B. van der Pol to trace the harmonic-to-relaxation-oscillation 
crossover for the first time, may be also reduced to Eq. (110), using the so-called Liénard's transformation. 

46 Note that while on the usual phase plane, the free-oscillation process corresponds to a clockwise rotation of the 
representation point (see, e.g., Fig. 9), the axes’ swap in Fig. 16 makes the rotation counterclockwise. 


Chapter 5 


Page 33 of 38 


Essential Graduate Physics CM: Classical Mechanics 


oscillation value 27, and its amplitude is also very close to the value 2/V3 ~ 1.15 predicted by the van 
der Pol method — see Eq. (64). 


As the damping is increased to J = 2 (middle panels), the limit cycle’s deviations from the 
circle, and hence the deviations of the waveform x(z) from a sinusoidal function become obvious. Note 
also that while the oscillation period becomes somewhat /onger than its small-oscillation value, the 
transient processes of approaching the limit cycle become faster. 


The trend of these changes becomes evident on the bottom panels, showing case 4 = 20. (The 
further increase of the damping does not change the results noticeably, only rescaling the displacements 
as x oc J — note the vertical scale of the bottom panels of Fig. 16.) It shows that the oscillation period is 
dominated by two similar parts, of equal duration. During these two intervals of relatively slow 
evolution, the limit cycle closely follows the declining branches of the function 


x= D(1-y2)y, (5.111) 


corresponding to the zero value of the first (and nominally, the largest) term in the second equation of 
the system (110) — see the dashed line on the left bottom panel. During these intervals, the displacement 
x grows in accordance with the first of these equations, with its right-hand part virtually equal to the yo 
corresponding to Eq. (111). Even without solving the resulting differential equation exactly,+” we see 
that at these brunches, with yo = +1, x(7) changes with a speed of the order of YZ, and hence the path from 
the initial and final points of each branch, of a length Av ~ 4, takes a time interval Az of the order of 1, 
just as the right panel shows. 


As soon as the system reaches the branch’s endpoint x = +(2/3V3)Z * +0.385, where the 
derivative dyo/dx diverges, the balance of the terms on the right-hand part of the second Eq. (110) is not 
more possible, and its magnitude abruptly becomes of the order of 4 >> 1. As a result, the system jumps 
from this point to the opposite branch of the curve (111) very rapidly, during a time interval At ~ Ayo/D 
~ 1/2 << 1, insufficient for x to change much. (The initial transient processes, i.e. the approaches to the 
limit cycle from almost arbitrary initial conditions, are equally fast, also with x ~ const.) Upon reaching 
the new branch, the system “relaxes” to a relatively slow evolution in the opposite direction (“relaxation 
oscillations”), and the process repeats again and again. 


Such oscillations take place in a large number of practical mechanical systems and electronic 
devices, ranging from the bowed string musical instruments (including those of the violin family), to 
usual mechanical clocks, to car light blinkers. Many of them allow for simple analyses; to save 
time/space, let me leave a couple of problems of this type for the reader’s exercise. 


5.10. Exercise problems 


5.1. A body of mass m is connected to its support not only with an elastic 
spring but also with a damper (say, an air brake) that provides a drag force obeying 
Eq. (5) — see the figure on the right. 


@ 


47 Its integration leads to an elementary function for ay), but transcendental equations for y(z) and x(7). 
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(i) How to select the constants « and 7 to minimize the body’s vibrations caused by vertical 
oscillations of its support with frequency @? 
(ii) What if the oscillations are random? 


5.2. For a system with the response function given by Eq. (17): 


(1) prove Eq. (26), and 
(11) use an approach different from the one used in Sec. 1, to derive Eq. (34). 


Hint: You may like to use the Cauchy integral theorem and the Cauchy integral formula for 
analytical functions of a complex variable.*8 
coat f(t) 
5.3. A square-wave pulse of force (see the figure on the right) is exerted sf, 
on a linear oscillator of frequency @ (with no damping), initially at rest. 
Calculate the law of motion q(t), sketch it, and interpret the result. 
0 22/a, t 
5.4. At t= 0, a sinusoidal external force F(t) = Focosa@t, starts to be exerted on a linear oscillator 
with frequency @ and damping 6, which was at rest at ¢ < 0. 


(1) Derive the general expression for the time evolution of the oscillator’s displacement, and 
interpret the result. 

(11) Spell out the result for the exact resonance (@= @) in a system with low damping (6<< @), 
and, in particular, explore the limit 6— 0. 


5.5. A pulse of external force F(t), with a finite duration 7, is exerted on a linear oscillator, 


initially at rest in its equilibrium position. Neglecting dissipation, calculate the change of the oscillator’s 
energy, using two different approaches, and compare the results. 


5.6. A bead may slide, without friction, in a vertical plane along a parabolic curve y = ax’/2, ina 
uniform gravity field g = —gn,. Calculate the frequency of its free oscillations as a function of their 
amplitude A, in the first nonvanishing approximation in A —> 0, using two different approaches. 


5.7. For a system with the Lagrangian function 
M .94 kK > wd 
L=—q° -—q +6q', 
5 q 5 q q 


with small parameter ¢, use the harmonic balance method to find the frequency of free oscillations as a 
function of their amplitude. 


5.8. Use a different approach to derive Eq. (49) for the frequency of free oscillations of the 
system described by Eq. (43) with 6= 0, in the first nonvanishing approximation in the small parameter 
allay << 1. 


5.9. On the plane [a1, a2] of two real, time-independent parameters a; and az, find the regions in 
which the fixed point of the following system of equations, 


48 See, e.g., MA Eq. (15.1). 
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Gq, =4,(9. —%), 


Gx = 944, — |r, 


is unstable, and sketch the regions of each fixed point type — stable and unstable nodes, focuses, etc. 


5.10. Solve Problem 4(ii) using the reduced equations (57), and compare the result with the exact 
solution. 


5.11. Use the reduced equations to analyze forced oscillations in an oscillator with weak 
nonlinear damping, described by the following equation: 


G+209+0,q+ By = f, cosat, 


with @ = @; B56 > 0; and Bad’ << 1. In particular, find the stationary amplitude of the forced 
oscillations and analyze their stability. Discuss the effect(s) of the nonlinear term on the resonance. 


5.12. Within the approach discussed in Sec. 4, calculate the average frequency of a self-oscillator 
outside of the range of its phase-locking by a small external sinusoidal force. 


5.13." Use the reduced equations to analyze the stability of the forced nonlinear oscillations 
described by the Duffing equation (43). Relate the result to the slope of resonance curves (Fig. 4). 


5.14. Use the van der Pol method to find the condition of 
parametric excitation of the oscillator described by the following 
equation: 


G + 269 + @) (t)q = 0, 


where @’(t) is the square-wave function shown in the figure on the 
right, with @~ @. 


5.15. Use the van der Pol method to analyze parametric excitation of an oscillator with weak 
nonlinear damping, described by the following equation: 


G +269 + Bq? +a, (1+ ucos2at)g = 0, 


with @ © @; B, o> 0; and us, BoA’ << 1. In particular, find the amplitude of stationary oscillations and 
analyze their stability. 


5.16. Adding nonlinear term aq’ to the left-hand side of Eq. (75), 


(i) find the corresponding addition to the reduced equations, 

(ii) calculate the stationary amplitude A of the parametric oscillations, 

(iii) find the type and stability of each fixed point of the reduced equations, 
(iv) sketch the Poincaré phase planes of the system in major parameter regions. 


5.17. Use the van der Pol method to find the condition of parametric excitation of an oscillator 


with weak modulation of both the effective mass m(t) = mo(1 + fmcos 2af) and the effective spring 
constant «(f) = «[1 + 4,.cos(2a@t —y)], with the same frequency 2@ ~ 2, at arbitrary modulation depths 
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ratio L4m/L44 and phase shift y. Interpret the result in terms of modulation of the instantaneous frequency 
at) = [x(t)/m(t)]'? and the mechanical impedance Z(t) = [x(t)m(2)]'” of the oscillator. 


5.18." Find the condition of parametric excitation of a nonlinear oscillator described by the 
following equation: 


G+269+0,9+7q° = fy cos 2at, 


with sufficiently small 6, 4 fo, and € = @- @. 


5.19. Find the condition of stability of the equilibrium point g = 0 of a parametric oscillator 
described by Eq. (75), in the limit when 6<< |@| << wand us<< 1. Use the result to analyze the stability 
of the Kapitza pendulum mentioned in Sec. 5. 


5.20. Use numerical simulation to explore phase-plane trajectories [g,q] of an autonomous 


pendulum described by Eq. (42) with fo = 0, for both low and high damping, and discuss their most 
significant features. 


5.21. Analyze relaxation oscillations of the system shown K Wi 
in the figure on the right: an elastic spring prevents a block of \ /\ p i | | ye He <L: 
mass m from being carried away by a horizontal conveyer belt * 
moving with a constant velocity wu. Assume that the coefficient 


of the kinematic friction between the block and the belt is lower 
than the static friction coefficient /.. 


—————__> 
u = const 


5.22. The figure on the right shows the electric circuit of the 
simplest relaxation oscillator. In it, N is a bistable circuit element that 
switches very rapidly from its very-high-resistance state to a very-low- €,R C N 
resistance state as the voltage across it is increased beyond some value 
V,, and switches back as the voltage is decreased below another value 
V.’ < V..49 Calculate the waveform and the time period of voltage 
oscillations in the circuit. 


Hint: The solution of this problem requires a very basic understanding of electric circuits, 
including the e.m.f. & and internal resistance R of a dc current source such as an electric battery. 


49 This is a good model for many two-terminal gas-discharge devices (such as glow lamps), whose effective 
resistance may drop by up to 5 orders of magnitude when the discharge has been activated by voltage V> V,. In 
usual neon glow lamps, V,’ is about 30% lower than JV, 
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Chapter 6. From Oscillations to Waves 


In this chapter, the discussion of oscillations is extended to systems with two and more degrees of 
freedom. This extension naturally leads to another key notion of physics — waves, so far in simple, 
mostly 1D systems. (In the next chapter, this discussion will be extended to more complex elastic 
continua.) However, even the limited scope of the models analyzed in this chapter will still enable us to 
discuss such important general aspects of waves as their dispersion, phase and group velocities, 
impedance, reflection, and attenuation. 


6.1. Two coupled oscillators 


Let us discuss oscillations in systems with several degrees of freedom, starting from the simplest 
case of two linear (harmonic), dissipation-free, 1D oscillators. If the oscillators are independent of each 
other, the Lagrangian function of their system may be represented as a sum of two independent terms of 
the type (5.1): 

Moy .. Kip 


5 q\2 3 


L=L,+L,, Ly, = Ti» U., = Gis: (6.1) 
Correspondingly, Eqs. (2.19) for q; = qi yields two independent equations of motion of the oscillators, 
each one being similar to Eq. (5.2): 


= (6.2) 


ss 2 2 
M5912 + mM, Q6 041 = 0, where Or» = 
Mm, > 
(In the context of what follows, Q;,2 are sometimes called the partial frequencies.) This means that in 
this simplest case, an arbitrary motion of the system is just a sum of independent sinusoidal oscillations 
at two frequencies equal to the partial frequencies (2). 


However, as soon as the oscillators are coupled (i.e. interact), the full Lagrangian ZL contains an 
additional mixed term Lint depending on both generalized coordinates q; and qz2 and/or generalized 
velocities. As a simple example, consider the system shown in Fig. 1, where two small masses m2 are 
constrained to move in only one direction (shown horizontal), and are kept between two stiff walls with three 
springs. 


KL 


Fig. 6.1. A simple system of two 
coupled linear oscillators. 


In this case, the kinetic energy is still separable, T = T; + 7», but the total potential energy, 
consisting of the elastic energies of three springs, is not: 


k k k 
tag dh +a —q,) +: (6.3a) 


where qi.2 are the horizontal displacements of the particles from their equilibrium positions. It is 
convenient to rewrite this expression as 
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K 
U=—1q7 + 7h KQ)95 where K, =K,+Ky, Ky =Kpt+Ky, K=Ky- (6.3b) 


This formula shows that the Lagrangian function L = 7’ — U of this system contains, besides the partial 
terms (1), a bilinear interaction term: 


L=L, +L, + Lins Lin = 192- (6.4) 


The resulting Lagrange equations of motion are as follows: 


Linearly 


(6.5) coupled 


oscillators 


es 2 
mg, +m,Q7q, =, 


mG, + m,Q3q, = kq,. 


Thus the interaction leads to an effective generalized force «gz exerted on subsystem | by subsystem 2, 
and the reciprocal effective force «q1. 


Please note two important aspects of this (otherwise rather simple) system of equations. First, in 
contrast to the actual physical interaction forces (such as F\2 = —F21 = «Ku(q2 — qi) for our system!) the 
effective forces on the right-hand sides of Eqs. (5) do not obey the 3“ Newton law. Second, the forces 
are proportional to the same coefficient «; this feature is a result of the general bilinear structure (4) of 
the interaction energy, rather than of any special symmetry. 


From our prior discussions, we already know how to solve Eqs. (5), because it is still a system 
of linear and homogeneous differential equations, so that its general solution is a sum of particular 
solutions of the form similar to Eqs. (5.88), 


q = ce”, qx = ce" > (6.6) 


with all possible values of 1. These values may be found by plugging Eq. (6) into Eqs. (5), and requiring 
the resulting system of two linear, homogeneous algebraic equations for the distribution coefficients c; 2, 


2 2 
m Ac, +m,Qic, = Kc, 
(6.7) 
2 Oo 
mA c, +m,Q.;c, = Ke,, 


to be self-consistent. In our particular case, we get a characteristic equation, 


2 2 
hae +?) -K |g (6.8) 


-K m,(@ +Q2) 


that is quadratic in 2°, and thus has a simple analytical solution: 


1/2 
--L (oi +.)s| Har a3) + a (6.9) 


! Using these expressions, Eqs. (5) may be readily obtained from the Newton laws, but the Lagrangian approach 
used above will make their generalization, in the next section, more straightforward. 
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According to Eqs. (2) and (3b), for any positive spring constants, the product Q)Q2 = (a, + 
Ku Kr + Ku)/(mym2)"” is always larger than «/(m 1m2)"* = Ky/(m mz)", so that the square root in Eq. 
(9) is always smaller than (Q/7+0,")/2. As a result, both values of J’ are negative, i.e. the general 
solution to Eq. (5) is a sum of four terms, each proportional to exp{+i@,t}, where both normal 
frequencies (or “natural frequencies”, or “eigenfrequencies”) @,= iA, are real: 


(6.10) 


A plot of these eigenfrequencies as a function of one of the partial frequencies (say, 1), with the 
other partial frequency fixed, gives us the famous anticrossing (also called the “avoided crossing” or 
“non-crossing”) diagram — see Fig. 2. One can see that at weak coupling, the normal frequencies @, are 
close to the partial frequencies Q; 2 everywhere besides a narrow range near the anticrossing point Q; = 
Q». Most remarkably, at passing through this region, @; smoothly “switches” from following Q: to 
following Q, and vice versa. 


a; 
Q,; Fig. 6.2. The anticrossing diagram for two 
values of the normalized coupling strength 
«(mymz)'"Q,": 0.3 (red lines) and 0.1 (blue 
lines). In this plot, Qj, is assumed to be changed 
by varying « rather than m, but in the opposite 
0 case, the diagram is qualitatively similar. 


The reason for this counterintuitive behavior may be found by examining the distribution 
coefficients cj. corresponding to each branch of the diagram, which may be obtained by plugging the 
corresponding value of A+ = —i@: back into Eqs. (7). For example, at the anticrossing point Q) = Q2= Q, 
Eq. (10) is reduced to 


(m,m, ) (x,x,) 


Plugging this expression back into any of Eqs. (7), we see that for the two branches of the anticrossing 
diagram, the distribution coefficient ratio is the same by magnitude but opposite by sign: 


1/2 
[2 =o) , at Q,=Q,. (6.12) 
Cy), mM, 


In particular, if the system is symmetric (m; = m2, kK, = kp), then at the upper branch, 
corresponding to @, > @., we get c; = —c2. This means that in this so-called hard mode,” masses oscillate 


2 In physics, the term “mode” (or “normal mode’) is typically used to describe the distribution of a variable in 
space, at its oscillations with a single frequency. In our current case, when the notion of space is reduced to two 
oscillator numbers, each mode is fully specified by the corresponding ratio of two distribution coefficients c; 9. 
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in anti-phase: qi(¢) = —q2(t). The resulting substantial extension/compression of the middle spring (see 
Fig. 1 again) yields additional returning force which increases the oscillation frequency. On the 
contrary, at the lower branch, corresponding to @_, the particle oscillations are in phase: c; = c2, 1.e. gi(t) 
= q(t), so that the middle spring is neither stretched nor compressed at all. As a result, in this soft mode, 
the oscillation frequency @. is lower than @,, and does not depend on Av: 


kK Kk kK 
oa =Q? = is us 


(6.13) 
m m m 


Note that for both modes, the oscillations equally engage both particles. 


Far from the anticrossing point, the situation is completely different. Indeed, a similar calculation 
of ci, shows that on each branch of the diagram, the magnitude of one of the distribution coefficients is 
much larger than that of its counterpart. Hence, in this limit, any particular mode of oscillations involves 
virtually only one particle. A slow change of system parameters, bringing it through the anticrossing, 
leads, first, to a maximal delocalization of each mode at Q, = ©, and then to a restoration of the 
localization, but in a different partial degree of freedom. 


We could readily carry out similar calculations for the case when the systems are coupled via 
their velocities, L,,, =mq,g,, where m is a coupling coefficient — not necessarily a certain physical 


mass.? The results are generally similar to those discussed above, again with the maximum level 
splitting at Q) = Q2 =: 
Q? m 
O, = = Tea oft | | =| (6.14) 
1¥|m|/(m,m, ) (m,m, ) 


the last relation being valid for weak coupling. The generalization to the case of simultaneous coordinate 
and velocity coupling is also straightforward — see the next section.4 


One more property of weakly coupled oscillators is a periodic slow transfer of energy between 
them, especially strong at or near the anticrossing point Q; = Q2. Let me leave an analysis of such 
transfer for the reader’s exercise. (Due to the importance of this effect for quantum mechanics, it will be 
discussed in detail in the QM part of this series.) 


6.2. N coupled oscillators 


The calculations of the previous section may be readily generalized to the case of an arbitrary 
number (say, V) of coupled harmonic oscillators, with an arbitrary type of coupling. It is obvious that in 
this case Eq. (4) should be replaced with 


3 In mechanics, with g; 2 standing for the actual displacements of particles, such coupling is not very natural, but 
there are many dynamic systems of non-mechanical nature in which such coupling is the most natural one. The 
simplest example is the system of two LC (“tank”) circuits, with either capacitive or inductive coupling. Indeed, 
as was discussed in Sec. 2.2, for such a system, the very notions of the potential and kinetic energies are 
conditional and interchangeable. 

4 Note that the anticrossing diagram shown in Fig. 2, is even more ubiquitous in quantum mechanics, because, due 
to the time-oscillatory character of the Schrédinger equation solutions, a weak coupling of any two quantum states 
leads to qualitatively similar behavior of the eigenfrequencies @, of the system, and hence of its eigenenergies 
(“energy levels’) Es. = ha, of the system. 
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N N 

L= QL, + DL (6.15) 
j=l i=l 

Moreover, we can generalize the above expression for the mixed terms Lj, taking into account their 

possible dependence not only on the generalized coordinates but also on the generalized velocities, in a 

bilinear form similar to Eq. (4). The resulting Lagrangian may be represented in a compact form, 


Vi(m, .. Ki 
i=) Pa, -Eaa, (6.16) 
joj'=l 


where the off-diagonal terms are index-symmetric: mj = mj, Gj’ = 4, and the factors '2 compensate for 
the double-counting of each term with j #7’, at the summation over two independently running indices. 


One may argue that Eq. (16) is quite general if we still want to keep the equations of motion linear — as 
they always are if the oscillations are small enough. 


Plugging Eq. (16) into the general form (2.19) of the Lagrange equation, we get N equations of 
motion of the system, one for each value of the index j’= 1, 2,..., N: 


N 


Yn, +K yG))= 0. (6.17) 


j=l 
Just as in the previous section, let us look for a particular solution to this system in the form 
q, =¢,e". (6.18) 


As a result, we are getting a system of N linear, homogeneous algebraic equations, 


N 
Llp +H Je; =0, (6.19) 
j=l 
for the set of N distribution coefficients c;. The condition that this system is self-consistent is that the 
determinant of its matrix equals zero: 


Det(m 2? +x ,)=0. (6.20) 


This characteristic equation is an algebraic equation of degree N for 2”, and so has N roots (A°),. For any 
Hamiltonian system with stable equilibrium, the matrices m,: and «;' ensure that all these roots are real 
and negative. As a result, the general solution to Eq. (17) is the sum of 2N terms proportional to exp 
{ti@nt},n = 1, 2,..., N, where all N normal frequencies @, are real. 


Plugging each of these 2N values of A = tia, back into a particular set of linear equations (17), 
one can find the corresponding sets of distribution coefficients cj. Generally, the coefficients are 
complex, but to keep q,() real, the coefficients c;, corresponding to 4 = +i@,, and c;. corresponding to 2 
= -—ia@, have to be complex-conjugate of each other. Since the sets of the distribution coefficients may be 
different for each A, they should be marked with two indices, j and n. Thus, at general initial conditions, 
the time evolution of the j" generalized coordinate may be represented as 


N N 
q; = >I. exp {tia,t} +c, exp{-ia,t})= Re)°c,, exp{ia,t}. (6.21) 


n=l n=l 
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This formula shows very clearly again the physical sense of the distribution coefficients cj,: a set 
of these coefficients, with different values of index 7 but the same mode number 7, gives the complex 
amplitudes of oscillations of the coordinates in this mode, i.e. for the special initial conditions that 
ensure purely sinusoidal motion of all the system, with frequency @. 


The calculation of the normal frequencies and the corresponding modes (distribution coefficient 
sets) of a particular coupled system with many degrees of freedom from Eqs. (19)}-(20) is a task that 
frequently may be only done numerically.> Let us discuss just two particular but very important cases. 
First, let all the coupling coefficients be small in the following sense: | mj | << mj; = mj and | Kj | << 6 
= k;, for all j +7, and all partial frequencies Q;= («/m;)"" * be not too close to each other: 


|a; -93, m 
ao ee : , forallj #7’. (6.22) 
Q; K, mM, 


(Such situation frequently happens if parameters of the system are “random” in the sense that they do 
not follow any special, simple rule — for example, the one resulting from some simple symmetry of the 
system.) Results of the previous section imply that in this case, the coupling does not produce a 
noticeable change in the oscillation frequencies: {@,}* {Q;}. In this situation, oscillations at each 
eigenfrequency are heavily concentrated in one degree of freedom, i.e. in each set of the distribution 
coefficients cj, (for a given n), one coefficient’s magnitude is much larger than all others. 


Now let the conditions (22) be valid for all but one pair of partial frequencies, say Q) and Qo, 
while these two frequencies are so close that the coupling of the corresponding partial oscillators 
becomes essential. In this case, the approximation {@,} = {Q,} 1s still valid for all other degrees of 
freedom, and the corresponding terms may be neglected in Eqs. (19) for 7 = 1 and 2. As a result, we 
return to Eqs. (7) (perhaps generalized for velocity coupling) and hence to the anticrossing diagram (Fig. 
2) discussed in the previous section. As a result, an extended change of only one partial frequency (say, 
Q,) of a weakly coupled system produces a sequence of eigenfrequency anticrossings — see Fig. 3. 


On 
4 
Ox4 
3 
QO; 
2 
Q) a 
Fig. 6.3. The level anticrossing in a system of N 
0 FA weakly coupled oscillators — schematically. 
1 
6.3. 1D waves 


The second case when the general results of the last section may be simplified are coupled 
systems with a considerable degree of symmetry. Perhaps the most important of them are uniform 


5 Fortunately, very effective algorithms have been developed for this matrix diagonalization task — see, e.g., 
references in MA Sec. 16(iii)-(iv). For example, the popular MATLAB software package was initially created 
exactly for this purpose. (“MAT” in its name stood for “matrix” rather than “mathematics’”.) 
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systems that may sustain traveling and standing waves. Figure 4a shows a simple example of such a 
system — a long uniform chain of particles, of mass m, connected with light, elastic springs, pre- 
stretched with the tension force Y to have equal lengths d. (This system may be understood as a natural 
generalization of the two-particle system considered in Sec. 1 — cf. Fig. 1.) 


(a) 
a m kK om Kom g 
d d 
m kK om K m (b) 
(c) 


Fig. 6.4. (a) A uniform 1D chain 
of elastically coupled particles, 
and their small (b) longitudinal 
and (c) transverse displacements 
(much exaggerated for clarity). 


(j-l)d jd Gi+)d 7 


The spring’s pre-stretch does not affect small /ongitudinal oscillations g; of the particles about 
their equilibrium positions z; = jd (where the integer 7 numbers the particles sequentially) — see Fig. 4b.° 
Indeed, in the 2° Newton law for such a longitudinal motion of the /" particle, the forces Y and —T 
exerted by the springs on the right and the left of it, cancel. However, the elastic additions, «Aq, to these 
forces are generally different: 


mG; = KG jn — 9) — (9; -— 9)4)- (6.23) 


On the contrary, for transverse oscillations within one plane (Fig. 4c), the net transverse 
component of the pre-stretch force exerted on the j" particle, % = (sing. — sing), where ¢: are the 
force direction angles, does not vanish. As a result, direct contributions to this force from small 
longitudinal oscillations, with | g;| << d, Y/x, are negligible. Also, due to the first of these strong 
conditions, the angles g, are small, and hence may be approximated, respectively, as @, ~ (qj+1— g/d 
and @. = (q;— q;-1)/d. Plugging these expressions into a similar approximation, % ~ A(g,— @.) for the 
transverse force, we see that it may be expressed as W(qj+1— q;)/d — S(q;— q;-1)/d, 1.€. is absolutely similar 


6 Note the need for a clear distinction between the equilibrium position z; of the j" point and its deviation q; from 
it. Such distinction has to be sustained in the continuous limit (see below), where it is frequently called the 
Eulerian description — named after L. Euler, even though it was introduced to mechanics by J. d’Alembert. In this 
course, the distinction is emphasized by using different letters — respectively, z and q (in the 3D case, r and q). 
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to that in the longitudinal case, just with the replacement x > S/d. As a result, we may write the 
equation of motion of the Vie particle for these two cases in the same form: 


mg; = Ko (4 jr — 9) — Kee (9; — 94) (6.24) 


where «er is the “effective spring constant”, equal to « for the longitudinal oscillations, and to Y/d for 
the transverse oscillations.’ 

Apart from the (formally) infinite size of the system, Eq. (24) is just a particular case of Eq. (17), 
and thus its particular solution may be looked for in the form (18), where, in light of our previous 
experience, we may immediately take 2? = —w. With this substitution, Eq. (24) gives the following 
simple form of the general system of equations (19) for the distribution coefficients c;: 

(— ma? + 2K Je; — Ky Cj) — Ky) = 0. (6.25) 
Now comes the most important conceptual step toward the wave theory. The translational symmetry of 
Eq. (25), i.e. its invariance with respect to the replacement 7 — / + 1, allows it to have particular 
solutions of the following form: 


c, =ae'™ , (6.26) 


where the coefficient @ may depend on @ (and system’s parameters), but not on the particle number /. 
Indeed, plugging Eq. (26) into Eq. (25) and canceling the common factor e'”, we see that this 
differential equation is indeed identically satisfied, provided that @ obeys the following algebraic 
characteristic equation: 


(— ma? +2.) —K et! -«,e'% =0. (6.27) 


The physical sense of the solution (26) becomes clear if we use it and Eq. (18) with 1 = Fia, to write 


(6.28) 


where the wave number k is defined as k = a/d. Eq. (28) describes a sinusoidal’ traveling wave of 
particle displacements, which propagates, depending on the sign before vpp, to the right or the left along 
the particle chain, with the so-called phase velocity 
a) 
Vn = 7 (6.29) 
Perhaps the most important characteristic of a wave system is the so-called dispersion relation, 
i.e. the relation between the wave’s frequency @ and its wave number k — one may say, between the 
temporal and spatial frequencies of the wave. For our current system, this relation is given by Eq. (27) 
with a = kd. Taking into account that (2 — e'” — e“”) = 2(1 — cosa) = 4sin’(@2), the dispersion relation 
may be rewritten in a simpler form: 


7 The re-derivation of Eq. (24) from the Lagrangian formalism, with the simultaneous strict proof that the small 
oscillations in the longitudinal direction and the two mutually perpendicular transverse directions are all 
independent of each other, is a very good exercise, left for the reader. 

8 In optics and quantum mechanics, such waves are usually called monochromatic; I will try to avoid this term 
until the corresponding parts (EM and QM) of my series. 


Chapter 6 Page 8 of 30 


1D 
traveling 
wave 


Phase 
velocity 


Essential Graduate Physics CM: Classical Mechanics 


a __ kd Kop) 
o=+@,,,, Sin— = +o aac where @,,,, =2|——]| . (6.30) 


max max 
m 


This result, sketched in Fig. 5, is rather remarkable in several aspects. I will discuss them in 
some detail, because most of these features are typical for waves of any type (including even the “de 
Broglie waves’, 1.e. wavefunctions, in quantum mechanics), propagating in periodic structures. 


Fig. 6.5. The dispersion 
relation (30). 


_In —1 0 +0 +22 kd 


do Od (Ky) 
@=tvk, where v=|—-|  =—™—- =/|—“ | d. (6.31) 
Ke heen 2 


Plugging Eq. (31) into Eq. (29), we see that the constant v plays, in the low-frequency limit, the role of 
the same phase velocity for waves of any frequency. Due to its importance, this acoustic-wave? limit 
will with the subject of the special next section. 


Second, when the wave frequency is comparable with @nax, the dispersion relation is not linear, 
and the system is dispersive. This means that as a wave, whose Fourier spectrum has several essential 
components with frequencies of the order of @nax, travels along the structure, its waveform (which may 
be defined as the shape of the line connecting all points gz), at the same time) changes.! This effect 
may be analyzed by representing the general solution of Eq. (24) as the sum (more generally, an 
integral) of the components (28) with different complex amplitudes a: 


+00 


— q(t) =Re Ja, expilkz, = okt | sdk . (6.32) 


—00 


This notation emphasizes the possible dependence of the component wave amplitudes a, and 
frequencies @ on the wave number k. While the latter dependence is given by the dispersion relation, in 
our current case by Eq. (30), the function a; is determined by the initial conditions. For applications, the 
case when a, is substantially different from zero only in a narrow interval, of a width Ak << kp around 
some central value ko, is of special importance. The Fourier transform reciprocal to Eq. (32) shows that 
this is true, in particular, for the so-called wave packet — a sinusoidal (“carrier”) wave modulated by a 
spatial envelope function of a large width Az ~ 1/Ak >> 1/ko — see, e.g., Fig. 6. 


9 This term is purely historical. Though the usual sound waves in air, which are the subject of acoustics, belong to 
this class, the waves we are discussing may have frequencies both well below and well above the human ear’s 
sensitivity range. 

10 The waveform’s deformation due to dispersion (which we are considering now) should be clearly distinguished 
from its possible change due to attenuation due to energy dissipation — which is not taken into account is our 
current energy-conserving model — cf. Sec. 6 below. 
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qj 


Fig. 6.6. The phase 
and group velocities 
of a wave packet. 


Using the strong inequality Ak << ko, the wave packet’s propagation may be analyzed by 
expending the dispersion relation @(k) into the Taylor series at point ko, and, in the first approximation 
in Ak/ ko, restricting the expansion to its first two terms: 


dao 


sa k=k, © where @, = a(k,), and k =k—k,. (6.33) 


Ok) = 


In this approximation, Eq. (32) yields 


+00 | a dao ae 
q(t) Re Ja, of +k)z, -(e. + pat, Fh 
(6.34) 


= Re exe, - cnt) fa, expe - “| cs (jae | 


—00 


Comparing the last expression with the initial form of the wave packet, 


q ;(0) =Re faye” dk = Re exh, fa, explikz, a : (6.35) 


00 00 


and taking into account that the phase factors before the integrals in the last forms of Eqs. (34) and (35) 
do not affect its envelope, we see that in this approximation, the envelope sustains its initial form and 
propagates along the system with the so-called group velocity 


_ do eS 


Except for the acoustic wave limit (31), this velocity, which characterizes the propagation of the 
waveform’s envelope, is different from the phase velocity (29), which describes the propagation of the 
carrier wave, e.g., the spatial position of one of its zeros — see the red and blue arrows in Fig. 6.!! 


Next, for our particular dispersion relation (30), the difference between vp, and ve, increases as @ 
approaches @nax, With the group velocity (36) tending to zero, while the phase velocity stays almost 
constant. The physics of such a maximum frequency available for the wave propagation may be readily 
understood by noticing that according to Eq. (30), at @= @max, the wave number & equals nz/d, where n 


11 Taking into account the next term in the Taylor expansion of the function a(q), proportional to d’a/dq’, we 
would find that the dispersion leads to a gradual change of the envelope’s form. Such changes play an important 
role in quantum mechanics, so that they are discussed in detail in the QM part of these lecture notes. 
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is an odd integer, and hence the phase shift @ = kd is an odd multiple of z. Plugging this value into Eq. 
(28), we see that at @= Mnax, the oscillations of two adjacent particles are in anti-phase, for example: 


q(t) = Rela exp{- ict} q(t) = Rela exp{iz - iat} | =-q,(t). (6.37) 


It is clear, especially from Fig. 4b for longitudinal oscillations, that at such a phase shift, all the springs 
are maximally stretched/compressed (just as in the hard mode of the two coupled oscillators analyzed in 
Sec. 1), so that it is natural that this mode has the highest possible frequency. 


This fact invites a natural question: what happens with the system if it is agitated at a frequency 
O> Omnax, Say by an external force exerted on its boundary? Reviewing the calculations that have led to 
the dispersion relation (30), we see that they are all valid not only for real but also for any complex 
values of k. In particular, at @> @max it gives 


d 


; where n =1,2,3,..... A=——,~-————.. 
2cosh™ (co! Omar ) 


(6.38) 


Plugging this relation into Eq. (28), we see that the wave’s amplitude becomes an exponential function 
of the particle’s position: 


a,|=|ale* 2" ox exp{tz, /A}. (6.39) 


Physically this means that penetrating into the structure, the wave decays exponentially (from the 
excitation point), dropping by a factor of e = 3 at the so-called penetration depth A. (According to Eq. 
(38), at @~ Mnax this depth is of the order of the distance d between the adjacent particles, and decreases 
but rather slowly as the frequency is increased beyond @max.) Such a limited penetration is a very 
common property of waves, including the electromagnetic waves penetrating into various plasmas and 
superconductors, and the quantum-mechanical de Broglie waves penetrating into classically-forbidden 
regions of space. Note that this effect of “wave expulsion” from the medium’s bulk does not require any 
energy dissipation. 


Finally, one more fascinating feature of the dispersion relation (30) is its periodicity: if the 
relation is satisfied with some wave number ko(@), it is also satisfied with any k,(@) = ko(@) + 2am/d, 
where n is an integer. This property is independent of the particular dynamics of the system and is a 
common property of all systems that are d-periodic in the usual (“direct”) space. It has especially 
important implications for the quantum de Broglie waves in periodic systems — for example, crystals — leading, in 
particular, to the famous band/gap structure of their energy spectrum.!? 


6.4. Acoustic waves 


Now let us return to the limit of low-frequency, dispersion-free acoustic waves, with | @| << @, 
propagating with the frequency-independent velocity (31). Such waves are the general property of any 
elastic continuous medium and obey a simple (and very important) partial differential equation. To 
derive it, let us note that in the acoustic wave limit, | kd | << 1,!3 the phase shift a@ = kd is very close to 


!2 For more detail see, e.g., QM Sec. 2.5. 
13 Strictly speaking, per the discussion at the end of the previous section, in this reasoning k means the distance of 
the wave number from the closest point 2 mm/d— see Fig. 5 again. 
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2a. This means that the differences qgj+i1(4) — gt) and gt) — gqj1(0), participating in Eq. (24), are 
relatively small and may be approximated with 0q/0j = 0q/O(z/d) = d(0q/0z), with the derivatives taken at 
middle points between the particles: respectively, z+ = (zj+1 — z)/2 and z= (z — z1)/2. Let us now 
consider z as a continuous argument, and introduce the particle displacement q(z, t) — a continuous 
function of space and time, satisfying the requirement q(z;, t) = g,(t). In this notation, in the limit kd > 0, 
the sum of the last two terms of Eq. (24) becomes —ad [0gq/0z(z+) — Ogq/dz(z.)], and hence may be 
approximated as —ad’(0°q/6z’), with the second derivative taken at point (z;— z.)/2 = zj, Le. exactly at the 
same point as the time derivative. As the result, the whole set of ordinary differential equations (24), for 
different 7, is reduced to just one partial differential equation 
Fal Fala 
map Ral a 


Using Eq. (31), we may rewrite this 7D wave equation in a more general form 


0. (6.40a) 


(6.40b) 


The most important property of the wave equation (40), which may be verified by an elementary 
substitution, is that it is satisfied by either of two traveling wave solutions (or their linear superposition): 


q,(z,t)= f,(t-z/v), q_(z,t)= f_(t+z/v), (6.41) 


where f. are any smooth functions of one argument. The physical sense of these solutions may be 
revealed by noticing that the displacements g: do not change at the addition of an arbitrary change Af to 
their time argument, provided that it is accompanied by an addition of the proportional addition of FvAt 
to their space argument. This means that with time, the waveforms just move (respectively, to the left or 
the right), with the constant speed v, retaining their form — see Fig. 7. !4 


Fig. 6.7. Propagation of a 
traveling wave in a 
dispersion-free 1D system. 


Returning to the simple model shown in Fig. 4, let me emphasize that the acoustic-wave velocity 
v is different for the waves of two types: for the longitudinal waves (with ker = x, see Fig. 4b), 


kK 1/2 
v=v,= [=] as (6.42) 
m 


while for the transverse waves (with Ker = S/d, see Fig. 4c): 


14 From the point of view of Eq. (40), the only requirement to the “smoothness” of the functions /. is to be doubly 
differentiable. However, we should not forget that in our case the wave equation is only an approximation of the 
discrete Eq. (24), so that according to Eq. (30), the traveling waveform conservation is limited by the acoustic 
wave limit condition @<< @nax, which should be fulfilled for all Fourier components of these functions. 
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T 1/2 CZ. 1/2 a 1/2 
v=v,-(4) a=(4| -(2) (6.43) 
md m ya 


where the constant 4 = m/d has a simple physical sense of the particle chain’s mass per unit length. 
Evidently, these velocities, in the same system, may be rather different. 


The wave equation (40), with its only parameter v, may conceal the fact that any wave- 
supporting system is characterized by one more key parameter. In our current model (Fig. 4), this 
parameter may be revealed by calculating the forces F(z, t) accompanying any of the traveling waves 
(41) of particle displacements. For example, in the acoustic wave limit kd + 0 we are considering now, 
the force exerted by the /" particle on its right neighbor may be approximated as 


Og 
Oz 


where, as was discussed above, Ker is equal to « for the longitudinal waves, and to Y/d for the transverse 


d (6.44) 


Z=Z. ? 
J 


F(z,,t) =«.[¢,0-9,.0]* 


waves. But for the traveling waves (41), the partial derivatives 0g./0z are equal to ¥ f, /v (where the dot 


means the differentiation over the full arguments of the functions f4), so that the corresponding forces 
are equal to 


: ae (6.45) 


i.e. are proportional to the particle’s velocities u = 0g/Ot in these waves,!> us = 7 _» for the same z and f. 
This means that the ratio 


Pet) ie g.le FAM (6.46) 
u,(z,t) ” Oq,/ Ot ° t: v 


depends only on the wave propagation direction, but is independent of z and ¢, and also of the 
propagating waveform. Its magnitude, 


F,(z,t) 


tae GA) K (6.47) 


characterizing the dynamic “stiffness” of the system for the propagating waves, is called the wave 
impedance.'® Note that the impedance is determined by the product of the system’s generic parameters 
Kee and m, while the wave velocity (31) is proportional to their ratio, so that these two parameters are 
completely independent, and both are important. According to Eq. (47), the wave impedance, just as the 
wave velocity, is also different for the longitudinal and transverse waves: 


Z,- (on), Z,== (Gu). (6.48) 


I t 


!5 Of course, the particle ’s velocity u (which is proportional to the wave amplitude) should not be confused with 
the wave’s velocity v (which is independent of this amplitude). 
16 This notion is regretfully missing from many physics (but not engineering! ) textbooks. 
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(Note that the first of these expressions for Z coincides with the one used for a single oscillator in Sec. 
5.6. In that case, Z may be also recast in a form similar to Eq. (46), namely, as the ratio of the force and 
velocity amplitudes at free oscillations.) 


One of the wave impedance’s key functions is to scale the power carried by a traveling wave: 


Oe a al cam 7 (6.49) 
Oz Ot Vv 

Two remarks about this important result. First, the sign of “ depends only on the direction of the wave 

propagation, but not on the waveform. Second, the instant value of the power does not change if we 

move with the wave in question, i.e. measure / at points with z + vt = const. This is natural because in 

the Hamiltonian system we are considering, the wave energy is conserved. Hence, the wave impedance 

Z characterizes the energy transfer along the system rather than its dissipation. 


Another important function of the wave impedance notion becomes clear when we consider 
waves in nonuniform systems. Indeed, our previous analysis assumed that the 1D system supporting the 
waves (Fig. 4) is exactly periodic, i.e. macroscopically uniform, and extends all the way from —<o to +0, 
Now let us examine what happens when this is not true. The simplest, and very important example of 
such nonuniform systems is a sharp interface, i.e. a point (say, z = 0) at which system parameters 
experience a jump while remaining constant on each side of the interface — see Fig. 8. 


—_—-2 
1 ———_> 
ae rae 
<+—_  : +—_ 
SS 
0 Zz Fig. 6.8. Partial reflection of a 
Pe Ae ee wave from a sharp interface. 


In this case, the wave equation (40) and its partial solutions (41) are is still valid for z < 0 and z > 
0 — in the former case, with primed parameters. However, the jump of parameters at the interface leads 
to a partial reflection of the incident wave from the interface, so that at least on the side of the incidence 
(in the case shown in Fig. 8, for z > 0), we need to use two such terms, one describing the incident wave 
and another one, the reflected wave: 


fi(t+z/v'), for z <0, 
q(z,t) = 


6.50 
f(t+z/v)+ f,(t-z/v), for z= 0. oe!) 


To find the relations between the functions f, f:, and f-’ (of which the first one, describing the 
incident wave, may be considered known), we may use two boundary conditions at z = 0. First, the 
displacement qo(t) of the particle at the interface has to be the same whether it is considered a part of the 
left or right sub-system, and it participates in Eqs. (50) for both z < 0 and z = 0. This gives us the first 
boundary condition: 


ft)= f.()+ £0). (6.51) 


On the other hand, the forces exerted on the interface from the left and the right should also have equal 
magnitude, because the interface may be considered as an object with a vanishing mass, and any 
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nonzero net force would give it an infinite (and hence unphysical) acceleration. Together with Eqs. (45) 
and (47), this gives us the second boundary condition: 


Z7i)=Z/7F.0- 7.0), (6.52) 
Integrating both parts of this equation over time, and neglecting the integration constant (which 
describes a common displacement of all particles rather than their oscillations), we get 


ZF (t)=Z[f()- £0). (6.53) 


Now solving the system of two linear equations (51) and (53) for f,(t) and ft (4), we see that both these 
functions are proportional to the incident waveform: 


Fa=RsO f'O=7 £() (6.54) 


with the following reflection (& ) and transmission (7) coefficients: 


(6.55) 


Later in this series, we will see that with the appropriate re-definition of the impedance, these 
relations are also valid for waves of other physical nature (including the de Broglie waves in quantum 
mechanics) propagating in 1D continuous structures, and also in continua of higher dimensions, at the 
normal wave incidence upon the interface.!7 Note that the coefficients € and 7 give the ratios of wave 
amplitudes, rather than their powers. Combining Eqs. (49) and (55), we get the following relations for 
the powers — either at the interface or at the corresponding points of the reflected and transmitted waves: 


2 t 
2 -(-) 2, pia BZl a 
Z+Z'} - (Zaz) = 


(6.56) 


Note that 7+ A, = 7’, again reflecting the wave energy conservation. 


Perhaps the most important corollary of Eqs. (55)-(56) is that the reflected wave completely 
vanishes, i.e. the incident wave is completely transmitted through the interface (7,.’ = A), if the so- 
called impedance matching condition Z’ = Z is satisfied, even if the wave velocities v (32) are different 
on the left and the right sides of it. On the contrary, the equality of the acoustic velocities in the two 
continua does not guarantee the full transmission of their interface. Again, this is a very general result. 


Finally, let us note that for the important particular case of a sinusoidal incident wave:!8 
f(t)=Relac|, so that f,(r)=Releae™ | (6.57) 


where a is its complex amplitude, the total wave (50) on the right of the interface is 


q(z,t) 2 Re|ae + z/v) a eae il a] 7 Re| a (“ie +Retitz) o-iar | for z>0,(6.58) 


'7 See, e.g. the corresponding parts of this series: QM Sec. 2.3 and EM Sec. 7.3. 

18 In the acoustic wave limit, when the impedances Z and Z’, and hence the reflection coefficient &, are real, © 
and Z may be taken from under the Re operators in Eqs. (57)-(59). However, in the current, more general form of 
these relations they are also valid for the case of arbitrary frequencies, @~ @max, when & and Z may be complex. 


Chapter 6 Page 15 of 30 


Essential Graduate Physics CM: Classical Mechanics 


while according to Eq. (45), the corresponding force distribution is 
F(c,t)=F (2,t)+ F,(c,t)=-Zf (t-z/v)+ Z,(¢-z/v)=Reliazale — eet Je | (6,50) 


These expressions will be used in the next section. 


6.5.Standing waves 
Now let us consider the two limits in which Eqs. (55) predicts a total wave reflection (7 = 0): 
Z/Z — «© (when & = -1) and Z’7/Z — 0 (when & = +1). According to Eq. (53), the former limit 
corresponds to f(t) + f:(4) = q(0, ) = 0, i.e. to vanishing oscillations at the interface. This means that this 


particular limit describes a perfectly rigid boundary, not allowing the system’s end to oscillate at all. In 
this case, Eqs. (58)-(59) yield 


q(z,t)=Re E (_-ite - ete | ian =—2Re E eiat| sin kz, (6.60) 
F(z,t)=Re|iaZale* + ot |e" |= 207 Re| a Pale ‘y coskz. (6.61) 


These equalities mean that we may interpret the process on the right of the interface using two 
mathematically equivalent, but physically different languages: either as the sum of two traveling waves 
(the incident one and the reflected one, propagating in opposite directions), or as a single standing wave. 
Note that in contrast with the traveling wave (Fig. 9a, cf. Fig. 7), in the standing sinusoidal wave (Fig. 
9b) all particles oscillate in time with the same phase. 


(b) Fig. 6.9. The time evolution of 
(a) a traveling sinusoidal wave, 
and (b) a standing sinusoidal 
wave at a rigid boundary. 


Note also that the phase of the force oscillations (61) is shifted, both in space and in time, by 2/2 
relatively to the particle displacement oscillations. (In particular, at the rigid boundary the force 
amplitude reaches its maximum.) As a result, the average power flow vanishes, so that the average 
energy of the standing wave does not change, though its instant energy still oscillates, at each spatial 
point, between its kinetic and potential components — just as at the usual harmonic oscillations of one 
particle. A similar standing wave, but with a maximum of the displacement qg, and with a zero (“node”) 
of the force F, is formed at the open boundary, with Z’/Z — 0, and hence € = +1. 


Now I have to explain why I have used the sinusoidal waveform for the wave reflection analysis. 
Let us consider a 1D wave system, which obeys Eq. (40), of a finite length /, limited by two rigid walls 
(located, say, at z = 0 and z = /), which impose the corresponding boundary conditions, 
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q(0,t) =g(U,t) =0, (6.62) 


on its motion. Naturally, a sinusoidal traveling wave, induced in the system, will be reflected from both 
ends, forming the standing wave patterns of the type (60) near each of them. These two patterns are 
compatible if / is exactly equal to an integer number (say, 7) of 2/2, where A = 277/k is the wavelength: 


) 1 
l=n—=n—. 6.63 
riers (6.63) 


This requirement yields the following spectrum of possible wave numbers: 


k, = n=, (6.64) 
where the list of possible integers n may be limited to non-negative values: n = 1, 2, 3,... (Indeed, 
negative values give absolutely similar waves (60), while n = 0 yields k, = 0, and the corresponding 
wave vanishes at all points: sin(0-z) = 0.) In the acoustic wave limit we are discussing, Eq. (31), @= vk, 
may be used to translate this wave-number spectrum into an equally simple spectrum of possible 
standing-wave frequencies:!9 


o, =vk, =am, with n =1,2,3,... (6.65) 


Now let us notice that this spectrum, and the corresponding standing-wave patterns, *° 
q”'(z,t) = 2Rela, exp{-ia,t}] sink,z, for O<z<l, (6.66) 


may be calculated in a different way, by a direct solution of the wave equation (41) with the boundary 
conditions (62). Indeed, let us look for the general solution of this partial differential equation in the so- 
called variable-separated form?! 


q(z,t)=>°Z,(z)7,(), (6.67) 


where each partial product Z,(z)T,(4) is supposed to satisfy the equation on its own. Plugging such 
partial solution into Eq. (40), and then dividing all its terms by the same product, Z,,7;,, we may rewrite 
the result as 
tsa 1, 
v T, dt? Z, dz’ 


(6.68) 


Here comes the punch line of the variable separation method: since the left-hand side of the equation 
may depend only on ¢, while its right-hand side, only on z, Eq. (68) may be valid only if both its sides 
are constant. Denoting this constant as —K,”, we get two similar ordinary differential equations, 
d°Z d°T 
+k-Z, =0, r +@,T, =0, where w, =v°k;, (6.69) 
Z t 


'9 Again, negative values of w may be dropped, because they give similar real functions q(z, ¢). 

20 They describe, in particular, the well-known transverse standing waves on a guitar string. 

21 This variable separation method is very general and is discussed in all parts of this series, especially in EM 
Chapter 2. 

22 The first of them is the 1D form of what is frequently called the Helmholtz equation. 
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with well-known (and similar) sinusoidal solutions 
Z, =c, cosk,z+s, sink, z, T, =u, cosa,t+v, sinw,z = Rela, exp{-ia,t}], (6.70) 


where Cy, Vn, Un, and v, (or, alternatively, a, =u, + iv,) are constants. The first of these relations, with all 
k, different, may satisfy the boundary conditions only if for all 7, c, = 0, and sink,/ = 0, giving the same 
wave number spectrum (64) and hence the own frequency spectrum (65), so that the general solution 
(67) of the so-called boundary problem, given by Eqs. (40) and (62), takes the form 


q(z,t)=Re> a, exp{-ia,t}sink,z, (6.71) 
where the complex amplitudes a, are determined by the initial conditions. 


Hence such sinusoidal standing waves (Fig. 10a) are not just an assumption, but a natural 
property of the 1D wave equation. It is also easy to verify that the result (71) is valid for the same 
system with different boundary conditions, though with a modified wave number spectrum. For 
example, if the rigid boundary condition (¢ = 0) is implemented at z = 0, and the so-called open 
boundary condition (F = 0, i.e. 6g/dz = 0) is imposed at z = /, the spectrum becomes 

1 


k, -(»-3}, with n =1,2,3,..., (6.72) 


so that the lowest standing waves look like Fig. 10b shows.” 


Fig. 6.10. The lowest standing 
wave modes for the 1D 
systems with (a) two rigid 
boundaries, and (b) one rigid 
and one open boundary. 


(b) 


Note that the difference between the sequential values of k,, is still a constant: 


kya hy =F (6.73) 
the same one as for the spectrum (64). This is natural because in both cases the transfer from the ni 
mode to the (n + 1)" mode corresponds just to an addition of one more half-wave — see Fig. 10. (This 
conclusion is valid for any combination of rigid and free boundary conditions.) As was discussed above, 
for the discrete-particle chain we have started with (Fig. 4), the wave equation (40), and hence the above 
derivation of Eq. (71), are only valid in the acoustic wave limit, i.e. when the distance d between the 
particles is much less than the wavelengths 4, = 27/k, of the mode under analysis. For a chain of length 
/, this means that the number of particles, N ~ //d, has to be much larger than 1. However, a remarkable 


23 The lowest standing wave of the system, with the smallest ,, and @,, is usually called its fundamental mode. 
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property of Eq. (71) is that it remains valid, with the same wave number spectrum (64), not only in the 
acoustic limit but also for arbitrary N > 0. Indeed, since sink,z = (exp {tik,z} — exp {-ik,z})/2, each n™ 
term of Eq. (71) may be represented as a sum of two traveling waves with equal but opposite wave 
vectors. As was discussed in Sec. 3, such a wave is a solution of equation (24) describing the discrete- 
particle system for any k,, with the only condition that its frequency obeys the general dispersion 
relation (30), rather than its acoustic limit (65). 


Moreover, the expressions for k,, (with appropriate boundary conditions), such as Eq. (64) or Eq. 
(72), also survive the transition to arbitrary N, because their derivation above was based only on the 
sinusoidal form of the standing wave. The only new factor arising in the case of arbitrary N is that due to 
the equidistant property (73) of the wave number spectrum, as soon as 1 exceeds N, the waveforms (71), 
at particle locations z; = jd, start to repeat. For example, 


sink, y ‘i 


z, =sin(k, + NAk)jd = sin, ya) jd =sin{k,z, +N) = +tsink,z,. (6.74) 

Hence the system has only N different (linearly-independent) modes. But this result is in full 
compliance with the general conclusion made in Sec. 2, that any system of N coupled oscillators has 
exactly N own frequencies and corresponding oscillation modes. So, our analysis of a particular system 
shown in Fig. 4, just exemplifies this general conclusion. Fig. 11 below illustrates this result for a 
particular finite value of N; the curve connecting the points shows exactly the same dispersion relation 
as was shown in Fig. 5, but now it is just a guide for the eye, because for a system with a finite length /, 
the wave number spectrum is discrete, and the intermediate values of k and @ do not have an immediate 
physical sense.2+ Note that the own frequencies of the system are generally not equidistant, while the 
wave numbers are. 


Fig. 6.11. The wave numbers and 
own frequencies of a chain of a 
finite number N of particles in a 
chain with one rigid and one open 
boundary — schematically. 


2n/d i 


This insensitivity of the spacing (73) between the adjacent wave numbers to the particular 
physics of a macroscopically uniform system is a very general fact, common for waves of any nature, 
and is broadly used for analyses of systems with a very large number of particles (such as human-size 
crystals, with N ~ 107). For N so large, the effect of the boundary conditions, e.g., the difference 
between the spectra (64) and (72) is negligible, and they may be summarized as the following rule for 
the number of different standing waves within some interval Ak >> 7/1: 


24 Note that Fig. 11 shows the case of one rigid and one open boundary (see Fig. 10b), where / = Nd; for a 
conceptually simpler system with two rigid boundaries (Fig. 10a) we would need to take / = (N + 1)d, because 
neither of the end points can oscillate. 
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A 


k standing i 


=—Ak 
k k vA 


n+l “n 


AN = 


standing * (6 75 a) 


For such analyses, it is frequently more convenient to work with traveling waves rather than the 
standing ones. In this case, we have to take into account that (as was just discussed above) each standing 
wave (66) may be decomposed into two traveling waves with wave numbers +, so that the interval Ak 
doubles, and Eq. (75a) becomes?5 


l 


AN a fo traveling ° (6.75b) 


Note that this counting rule is valid for waves of just one type. As was discussed above, for the 
model system we have studied (Fig. 4), there are 3 types of such waves — one longitudinal and two 
transverse, so that if we need to count them all, AN should be multiplied by 3. 


6.6 Wave decay and attenuation 


Now let us discuss the effects of energy dissipation on the 1D waves, on the example of the same 
uniform system shown in Fig. 4. The simplest description of this effect is the linear drag that may be 
described, as it was done for a single oscillator in Sec. 5.1, by adding the term 7dqj/dt, to Eq. (24) for 
each particle: 


mg , +14) — Kee (4 jr — 9) + Kee (9; — 94) = 9 . (6.76) 


(In a uniform system, the drag coefficient 7 should be similar for all particles, though it may be different 
for the longitudinal and transverse oscillations.) 


To analyze the dissipation effect on the standing waves, we may again use the variable 
separation method, i.e. look for the solution of Eq. (76) in the form similar to Eq. (67), naturally re- 
adjusting it for our current discrete case: 


gz, .t)=>.Z,(z,)7,0). (6.77) 


After dividing all terms by mZ,(z,)T,(t) and separating the time-dependent and space-dependent terms, 
we get 


T, m i] T, We Znle) | Plz) 
T, mT, m Z,(z,) 22, 


n 


2 |=const. (6.78) 


As we know from the previous section, the resulting equation for the function Z,(z;) is satisfied if the 
variable separation constant is equal to —@,°, where @, obeys the dispersion relation (30) for the wave 
number k,, properly calculated for the dissipation-free system, with the account of the given boundary 
conditions — see, e.g. Eqs. (62) and (72). Hence for the function 7,(¢) we are getting the following 
ordinary differential equation: 


25 Note that this simple, but very important relation is frequently derived using the so-called Born-Carman 
boundary condition qo(t) = gn(t), which implies bending the system of interest into a closed loop. For a 1D system 
with N >> 1, such mental exercise may be somehow justified, but for systems of higher dimension, it is hardly 
physically plausible — and is unnecessary. 
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f +26f,+@2T,=0, with d=—L, (6.79) 
2m 
which is absolutely similar to Eq. (5.6b) for a single linear oscillator, which was studied in Sec. 5.1. As 
we already know, it has the solution (5.9) describing the free oscillation decay with the relaxation time 
given by (5.10), c= 1/6, and hence similar for all modes.”° 


Hence, the above analysis of the dissipation effect on free standing waves has not brought any 
surprises, but it gives us a hint of how their forced oscillations, induced by some external forces F;(t) 
exerted on the particles, may be analyzed. Indeed, representing each of the forces as a sum over the 
system’s modes (spatial harmonics), 


#&p e270 Z,(z,), (6.80) 


and using the variable separation (77), we arrive at the natural generalization of Eq. (79): 


T,+26%, +a-2T, = f,(t), (6.81) 


which is identical to Eq. (5.13b) for a single oscillator. This fact enables us to use Eq. (5.27), with G(7) 
— G,(1), for the calculation of each 7,(¢). Now finding the functions f,(4) from Eq. (80) by the usual 
reciprocal Fourier transform, and plugging these results into Eq. (77), we get the following 
generalization of Eq. (5.27): 


N @ 
q(z;,t)= > | Z ys = )a (z,.z, t)dr, where Az, ,Z ji wt) > G,(z)z,(z,)Z,(z,.). (6.82) 
J=10 n 


(Here the mutually orthogonal functions Z,(z;) are assumed to be normalized, i.e. the sums of their 
squares over j = 1, 2,..., N to equal 1.) Such ‘4(z;, z;,7) 1s called the spatial-temporal Green’s function of 
the system — in our current case, of a discrete, 1D set of N particles located at points z; = jd. The reader 
is challenged to spell out this function for at least one of the particular cases discussed above and use it 
to solve at least one forced-oscillation problem. 


Now let us discuss the dissipation effects on the traveling waves, where they may take a 
completely different form of attenuation. Let us discuss it on a simple example when one end (located at 
z = 0) of a very long chain (/ + 0) is externally forced to perform sinusoidal oscillations of a certain 
frequency @ and a fixed amplitude Ap. In this case, it is natural to look for a particular solution to Eq. 
(76) in a form very different from Eq. (77): 


q(z,.t) = Rec er, (6.83) 


with time-independent but generally complex amplitudes c;. As our discussion of a single oscillator in 
Sec. 5.1 implies, this is not the general, but rather a partial solution, which describes the forced 


26 Even an elementary experience with acoustic guitars shows that for their strings this particular conclusion of 
our theory is not valid: higher modes (“overtones”) decay substantially faster, leaving the fundamental mode 
oscillations for a slower decay. This is a result of another important energy dissipation (i.e. the wave decay) 
mechanism, not taken into account in Eq. (76) — the radiation of the sound into the guitar’s body through the 
string supports, mostly through the bridge. Such radiation may be described by a proper modification of the 
boundary conditions (62), in terms of the ratio of the wave impedance (47) of the string and those of the supports. 
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oscillations in the system, to that it settles after some initial transient process. (At non-zero damping, we 
may be sure that free oscillations fade after a finite time, and thus may be ignored for most purposes.) 


Plugging Eq. (83) into Eq. (76), we reduce it to an equation for the amplitudes c;, 


(— ma —ie@n + 2K y Je; —KyC jy —Kep C4 =9, (6.84) 


jt 
which is a natural generalization of Eq. (25). As a result, partial solutions of the set of these equations 
(for 7 = 0, 1, 2,...) may be looked for in the form (26) again, but now, because of the new, imaginary 
term in Eq. (84), we should be ready to get a complex phase shift @, and hence a complex wave number 
k = a/d.’ Indeed, the resulting characteristic equation for k, 


sin — +i 6.85 
2 @w@ Oo ee) 


max max 


(where @max is defined by Eq. (30), and the damping coefficient is defined just as in a single oscillator, 6 
= n/2m), does not have a real solution even at @< @nax. Using the well-known expressions for the sine 
function of a complex argument,?® Eq. (85) may be readily solved in the most important low-damping 
limit 6 << @. In the linear approximation in 6, it does not affect the real part of 4, but makes its 
imaginary part different from zero: 


k= 22a 2 +1 2 
d @ @ 


max max 


d @ 


max 


of 2sin" zs 2) for —7<Rek<z, (6.86) 
Vv 


with a periodic extension to other periods — see Fig. 5. Just as was done in Eq. (28), due to two values of 
the wave number, generally we have to take c; in the form of not a single wave (26), but of a linear 
superposition of two partial solutions: 


6= 5G, exp| /Reke, ¥ 2. (6.87) 


where the constants c; should be found from the boundary conditions. In our particular case, when | co | 
= Ay and c~ = 0, only one of these two waves, namely the wave exponentially decaying at its penetration 
into the system, is different from zero: | c+| = Ao, c- = 0. Hence our solution describes a single wave, 
with the real amplitude and the oscillation energy decreasing as 


v 


: 2 
A, =|c, =A, exp| 2, E,« A; oc exp|—az,}, with @ a2 (6.88) 


i.e. with a frequency-independent attenuation constant a = 206/v,29 so that the spatial scale of wave 
penetration into a dissipative system is given by /g = 1/a. Certainly, our simple solution (88) is only 
valid for a system of length / >> /g; otherwise, we would need the second term in the sum (87) to 
describe the wave reflected from its opposite end. 


27 As a reminder, we have already met such a situation in the absence of damping, but at @ > @max— see Eq. (38). 
28 See, e.g., MA Eq. (3.5). 

29 1 am sorry to use for the attenuation the same letter @ as for the phase shift in Eq. (26) and a few of its 
corollaries, but both notations are traditional. 
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6.7 Nonlinear and parametric effects 


Now let me discuss (because of the lack of time, very briefly, and on a semi-quantitative level), 
the new nonlinear and parametric phenomena that appear in oscillatory systems with more than one 
degree of freedom — cf. Secs. 5.4-5.8. One important new effect here is the mutual phase locking of (two 
or more) weakly coupled self-excited oscillators with close frequencies: if the own frequencies of the 
oscillators are sufficiently close, their oscillation frequencies “stick together” to become exactly equal. 
Though its dynamics of this process is very close to that of the phase locking of a single oscillator by an 
external signal, which was discussed in Sec. 5.4, it is rather counter-intuitive in view of the results of 
Sec. 1, and in particular, the anticrossing diagram shown in Fig. 2. The analysis of the effect using the 
van der Pol method (which is left for the reader’s exercise) shows that the origin of the difference is the 
oscillators’ nonlinearity, which makes oscillation amplitudes virtually independent of the phase 
evolution — see Eq. (5.68) and its discussion. 


One more new effect is the so-called non-degenerate parametric excitation. It may be illustrated 
on the example of just two coupled oscillators — see Sec. 1 above. Let us assume that the coupling 
constant « participating in Eqs. (5) is not constant, but oscillates in time — say with some frequency @. 
In this case, the forces acting on each oscillator from its counterpart, described by the right-hand side of 
Eqs. (5), will be proportional to «q2,1(1 + 4 cos@pt). Assuming that the oscillations of g; and q2 are close 
to sinusoidal ones, with certain frequencies @ 2, we see that the force exerted on each oscillator contains 
the so-called combinational frequencies 

@,+@,,. (6.89) 


If one of these frequencies is close to the own oscillation frequency of the oscillator, we can expect a 
substantial parametric interaction between the oscillators (on top of the constant coupling effects 
discussed in Sec. 1). According to Eq. (89), this may happen in two cases: 

(6.90a) 
(6.90b) 


The quantitative analysis (also highly recommended to the reader) shows that in the case (90a), 
the parameter modulation indeed leads to energy “pumping” into the oscillations.3° As a result, a 
sufficiently large w, at sufficiently small damping coefficients 6) and the effective detuning 


§ =a, -(Q, +Q,), (6.91) 


may lead to a simultaneous self-excitation of two frequency components @ 2. These frequencies, while 
being approximately equal to the corresponding own frequencies © of the system, are related to the 
pumping frequency @, by the exact relation (90a), but otherwise are arbitrary, e.g, may be 
incommensurate (Fig. 12a), thus justifying the term non-degenerate parametric excitation?! (The 
parametric excitation of a single oscillator, which was analyzed in Sec. 5.5, is a particular, degenerate 
case of such excitation, with @ = @ = @,/2.) On the other hand, for the case described by Eq. (90b), the 
parameter modulation always extracts energy from the oscillations, effectively increasing the system’s 
damping. 


30 Hence the common name of @, — the pumping frequency. 
31 Note that in some publications, the term parametric down-conversion (PDC) is used instead. 
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Somewhat counter-intuitively, this difference between the two cases (90) may be simpler 
interpreted using the basic notions of quantum mechanics. Namely, the equality @ = @, + @ enables a 
decay of an external photon of energy ii@, into two photons of energies i@ and ha of the oscillators. 
On the contrary, the complementary relation (90b), meaning that @ = @) + q@, results in a pumping- 
induced decay of photons of frequency @. 


(a) _at+a, (b) 


sa 2 
wn 


0 oO, 0, @,=@,+@, frequency 0 Oa QW, frequency 


Fig. 6.12. Spectra of oscillations at (a) the non-degenerate parametric excitation, and (b) the four- 
wave mixing. The arrow directions symbolize the energy flows into and out of the system. 


Note that even if the frequencies @ and @ of the parametrically excited oscillations are 
incommensurate, the oscillations are highly correlated. Indeed, the quantum-mechanical theory of this 
effect32 shows that the generated photons are entangled. This fact makes the parametric excitation very 
popular for a broad class of experiments in several currently active fields including quantum 
computation and encryption, and the Bell inequality/local reality studies. 


Proceeding to nonlinear phenomena, let us note, first of all, that the simple reasoning that 
accompanied Eq. (5.108) in Sec. 5.8, is also valid in the case when oscillations consist of two (or more) 
sinusoidal components with incommensurate frequencies. Replacing the notation 2@ with @, we see 
that the non-degenerate parametric excitation of the type (90a) is possible in a system of two coupled 
oscillators with a quadratic nonlinearity (of the type gq’), “pumped” by an intensive external signal at 
frequency @) ~ Q); + Oz. In optics, it is often more convenient to have all three of these frequencies 
within the same, relatively narrow range. A simple calculation, similar to the one made in Eqs. (5.107)- 
(5.108), shows that this may be done using the cubic nonlinearity24 of the type ag*, which allows a 
similar parametric energy exchange at the frequency relation shown in Fig. 12b: 

Four- 


20=0,+0,, with @~ @, © Q,. (6923) =" 
1 2 1 2 mixing 


This process is often called the four-wave mixing, because it may be interpreted quantum- 
mechanically as the transformation of two externally-delivered photons, each with energy fia, into two 
other photons of energies i@, and ha@. The word “wave” in this term stems from the fact that at optical 
frequencies, it is hard to couple a sufficient volume of a nonlinear medium with lumped-type resonators. 
It is much easier to implement the parametric excitation (as well as other nonlinear phenomena such as 
the higher harmonic generation) of light in distributed systems of a linear size much larger than the 
involved wavelengths. In such systems, the energy transfer from the incoming wave of frequency @ to 


32 Which is, surprisingly, not much more complex than the classical theory — see, e.g., QM Sec.5.5. 

33 See, e.g., QM Secs. 8.5 and 10.3, respectively. 

34 In optics, such nonlinearity is implemented using transparent crystals such as lithium niobate (LiNbO3), with 
the cubic-nonlinear dependence of the electric polarization on the applied electric field: P< €+ aé’. 
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generated waves of frequencies @ and @ is gradually accumulated at their joint propagation along the 
system. From the analogy between Eq. (85) (describing the evolution of the wave’s amplitude in space), 
and the usual equation of the linear oscillator (describing its evolution in time), it is clear that this 
energy transfer accumulation requires not only the frequencies @ but also the wave numbers k be in 
similar relations. For example, the four-wave mixing requires that not only the frequency balance (92a) 
but also a similar relation 

2k =k,+k,, (6.92b) 


to be fulfilled. Since all three frequencies are close, this relation is easy to arrange. Unfortunately, due to 
the lack of time/space, for more discussion of this very interesting subject, called nonlinear optics, I 
have to refer the reader to special literature.35 


It may look like a dispersion-free media, with w/k = v = const, is the perfect solution for 
arranging the nonlinear/parametric interaction of waves, because in such media, for example, Eq. (92b) 
automatically follows from Eq. (92a). However, in such a medium, not only the desirable three 
parametrically interacting waves but also all their harmonics, have the same velocity. At these 
conditions, the energy transfer rates between all harmonics are of the same order. Perhaps the most 
important result of such a multi-harmonic interaction is that intensive incident traveling waves, 
interacting with a nonlinear medium, may develop sharply non-sinusoidal waveforms, in particular those 
with an almost instant change of the field at a certain moment. Such shock waves, especially those of 
mechanical nature, are of large interest for certain applications — some of them not quite innocent, e.g., 
the dynamics of explosion in the usual (chemical) and nuclear bombs.*° 


To conclude this chapter, let me note that the above discussion of 1D acoustic waves will be 
extended, in Sec. 7.7, to elastic 3D media. There we will see that generally, the waves obey a more 
complex equation than the apparently natural generalization of Eq. (40): 


[2S —v" lle) 0, (6.93) 
Vv 


where V’ is the 3D Laplace operator. This fact adds to the complexity of traveling-wave and standing- 
wave phenomena in higher dimensions. Moreover, in multi-dimensional systems, including such 
pseudo-1D systems as thin rods and pseudo-2D systems such as thin membranes, even static elastic 
deformations may be very nontrivial. An introduction to the general theory of small deformations, with a 
focus on elastic continua, will be the subject of the next chapter. 


6.8 Exercise problems 


For each of the systems specified in Problems 6.1-6.6: 


(i) introduce convenient generalized coordinates q; of the system, 


35 See, e.g., N. Bloembergen, Nonlinear Optics, A" ed., World Scientific, 1996, or a more modern treatment by 
R. Boyd, Nonlinear Optics, 3" ed., Academic Press, 2008. This field is currently very active. As just a single 
example, let me mention the recent experiments with parametric amplification of ultrashort (~20-fs) optical pulses 
to peak power as high as ~5x10'* W — see X. Zeng et al., Optics Lett. 42, 2014 (2017). 

36 The classical (and perhaps still the best) monograph on the subject is Ya. Zeldovich, Physics of Shock Waves 
and High-Temperature Phenomena, Dover, 2002. 


Chapter 6 Page 25 of 30 


Essential Graduate Physics CM: Classical Mechanics 


(11) calculate the frequencies of its small harmonic oscillations near the equilibrium, 
(iii) calculate the corresponding distribution coefficients, and 
(iv) sketch the oscillation modes. 


6.1. Two elastically coupled pendula, confined to a vertical plane, with j 
the parameters shown in the figure on the right (see also Problems 1.8 and 2.9). 
‘ 
m m 


6.2. The double pendulum, confined to a vertical plane containing the support 
point (considered in Problem 2.1), with m’ = m and / = /’— see the figure on the right. 


6.3 The chime bell considered in Problem 4.12 (see the figure on the right), for the 
particular case / = /’. 


6.4. The triple pendulum shown in the figure on the right, with the motion 
confined to a vertical plane containing the support point. 


l 
Hint: You may use any (e.g., numerical) method to calculate the characteristic g | ] 
equation’s roots. m 


6.5. A symmetric system of three particles, shown in the figure on | Pe 6) ie. '3 
the right, where the connections between the particles not only act as usual o————_( 0 
elastic springs (giving potential energies U = x(AJ)’/2) but also resist m m' m 


bending, giving additional potential energy U’ = «°(/0) 7/2, where 0 is the ©<— ——><——_—> 
(small) bending angle.” 


6.6. Three similar beads of mass m, which may slide along a circle of 
radius R without friction, connected with similar springs with elastic constants kK m@Q 


and equilibrium lengths J) (generally not equal to V3R) — see the figure on the Nn 
right. 


37 This is a good model for small oscillations of linear molecules such as the now-infamous COo. 
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6.7. On the example of the model considered in Problem 1, explore free oscillations in a system 
of two similar and weakly coupled linear oscillators. 


\ 


6.8. A small body is held by four similar elastic springs as shown 
in the figure on the right. Analyze the effect of rotation of the system as a 
whole about the axis normal to its plane, on the body’s small oscillations 
within this plane. Assume that the oscillation frequency is much higher 
than the angular velocity @ of the rotation. Discuss the physical sense of 
your results, and possible ways of using such systems for measurement 
of the rotation. 


6.9. An external longitudinal force F(d) is F(t) 
applied to the right particle of the system shown in Fs a N 
Fig. 1, with «= KR = «’ and m, = m2 = m (see the 
figure on the right), and the response q(t) of the left \ i = —? ‘ 
q «K qo ie 


particle to this force is being measured. 


(i) Calculate the temporal Green’s function for this response. 
(ii) Use this function to calculate the response to the following force: 


0, for t <0, 
F(t)=4 2 
F,sinet, for0<t, 


with constant amplitude Fo and frequency @. 


6.10. Use the Lagrangian formalism to re-derive Eqs. (24) for both the longitudinal and the 
transverse oscillations in the system shown in Fig. 4a. 


6.11. Calculate the energy (per unit length) of a sinusoidal traveling wave propagating in the 1D 
system shown in Fig. 4a. Use your result to calculate the average power flow created by the wave, and 
compare it with Eq. (49) in the acoustic wave limit. 


6.12. Calculate spatial distributions of the kinetic and potential energies in a standing, sinusoidal, 
1D acoustic wave, and analyze their evolution in time. 


6.13. The midpoint of a guitar string of length / has been slowly pulled off by distance h << / 
from its equilibrium position, and then let go. Neglecting dissipation, use two different approaches to 
calculate the midpoint’s displacement as a function of time. 


2 - 2 
Hint: You may like to use the following series: Sy ale = (1 - =) for O< €<z. 
m=1 (2m = 1) 8 mw/2 


6.14. Spell out the spatial-temporal Green’s function (82) for waves in a 1D uniform system of N 
points, with the rigid boundary conditions (62). Explore the acoustic limit of your result. 
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6.15. Calculate the dispersion law a(x) and the maximum 


and minimum frequencies of small longitudinal waves in a long g | i l 
chain of similar, spring-coupled pendula — see the figure on the 
right. 


mM m m 


6.16. Calculate and analyze the dispersion relation K kK kK 
ak) for small waves in a long chain of elastically coupled oN\/\/AQG\/\).\/\-@- 
particles with alternating masses — see the figure on the jy mit m fl 


right. In particular, discuss the dispersion relation’s period 


Ak, and its evolution at m’ > m. d d d 
6.17. Analyze the traveling wave’s reflection from a is ie K 
“point inhomogeneity”: a single particle with a different -o\/\/-0“\/\/- F/\- 
mass mo # m, within an otherwise uniform 1D chain—- see ”™ m Mo m 
the figure on the right. : d os d a d * 
6.18." 


(i) Explore an approximate way to analyze waves in a continuous 1D system with 
parameters slowly varying along its length. 

(ii) Apply this method to calculate the frequencies of transverse standing waves on 
a freely hanging heavy rope of length /, with a constant mass yz per unit length — see the 
figure on the right. 

(111) For the three lowest standing wave modes, compare the results with those 
obtained in the solution of Problem 4 for the triple pendulum. 


Hint: The reader familiar with the WKB approximation in quantum mechanics (see, e.g., QM 
Sec. 2.4) is welcome to adapt it for this classical application. Another possible starting point is the van 
der Pol approximation discussed in Sec. 5.3, which should be translated from the time domain to the 
space domain. 


6.19. A particle of mass m is attached to an infinite string, of mass wz per unit length, stretched 
with tension Y. The particle is confined to move along the x-axis normal to the string (see the figure 
below), in an additional potential U(x) with a minimum at x = 0. Assuming that the waves on the string 
are excited only by the motion of the particle (rather than any external source), reduce the system of 
equations describing the system to an ordinary differential equation for the small displacements x(¢). For 
the case of a linear oscillator, when U(x) = ma’x’/2, calculate its Q-factor due to the effective drag 


caused by the string. 
xg 


<— —> 


F 0 zz G¢ 


6.20. Use the van der Pol method to analyze the mutual phase locking of two weakly coupled 
self-oscillators with the dissipative nonlinearity, for the cases of: 
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(1) the direct coordinate coupling described by Eq. (5), and 
(ii) a bilinear but otherwise arbitrary coupling of two similar oscillators. 


Hint: In Task (ii), describe the coupling by a linear operator, and express the result via its 
Fourier image. 


6.21. Extend Task (ii) of the previous problem to the mutual phase locking of N similar self- 


oscillators. In particular, explore the in-phase mode’s stability for the case of the so-called global 
coupling via a single force F contributed equally by all oscillators. 


6.22.” Find the condition of non-degenerate parametric excitation in a system of two coupled 
oscillators described by Eqs. (5), but with a time-dependent coupling: «> «(1 + “2cos@t), with @ ~ 
Q) + Q», and Km <<|Q,-Q11. 


Hint: Assuming the modulation depth yw, the static coupling coefficient «, and the detuning ¢ = 
@p — (Q)+ Q2) sufficiently small, use the van der Pol method for each of the coupled oscillators. 


6.23. Show that the cubic nonlinearity of the type aq* indeed enables the parametric interaction 
(“four-wave mixing’’) of oscillations with incommensurate frequencies related by Eqs. (92a). 


6.24. In the first nonvanishing approximation in small oscillation amplitudes, calculate their 
effect on the own frequencies of the same double-pendulum system that was the subject of Problem 1. 


6.25. Calculate the velocity of small transverse waves propagating on a thin, planar, elastic 
membrane, with mass » per unit area, pre-stretched with force 7 per unit width. 


6.26. A membrane discussed in the previous problem is Tp F 
stretched on a thin but firm plane frame of area axa. Calculate the 
frequency spectrum of small transverse standing waves in the system; d | 
sketch a few lowest wave modes. Compare the results with those for a 
discrete-point analog of this system, with four particles of equal masses d 
m, connected with light flexible strings that are stretched, with equal 
tensions Y; on a similar frame — see the figure on the right. (The frames d 
do not allow the membrane edges/string ends to deviate from their 
planes.) 
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Chapter 7. Deformations and Elasticity 


The objective of this chapter is a discussion of small deformations of 3D continua, with a focus on the 
elastic properties of solids. The reader will see that such deformations are nontrivial even in the 
absence of their evolution in time, so that several key problems of statics will need to be discussed 
before proceeding to such dynamic phenomena as elastic waves in infinite media and thin rods. 


7.1. Strain 


As was already discussed in Chapters 4-6, in a continuum, 1.e. a system of particles so close to 
each other that the system discreteness may be neglected, particle displacements q may be considered as 
a continuous function of not only time but also space. In this chapter, we will consider only small 
deviations from the rigid-body approximation discussed in Chapter 4, i.e. small deformations. The 
deformation smallness allows us to consider the displacement vector q as a function of the initial (pre- 
deformation) position of the particle, r, and time ¢ — just as was done in Chapter 6 for 1D waves. 


The first task of the deformation theory is to exclude from consideration the types of motion 
considered in Chapter 4, namely the body’s translation and rotation, unrelated to deformations. This 
means, first of all, that the variables describing deformations should not depend on the displacement’s 
part that is independent of the position r (i.e. is common for the whole media), because that part 
corresponds to a translational shift rather than to a deformation (Fig. 1a). Moreover, even certain non- 
uniform displacements do not contribute to deformation. For example, Eq. (4.9) (with dr replaced with 
dq to comply with our current notation) shows that a small displacement of the type 


dQrotation = doxr, (7. 1) 


where dp = qdt is an infinitesimal vector common for the whole continuum, corresponds to its 
elementary rotation of the body about the direction of that vector, and has nothing to do with its 
deformation (Fig. 1b). 


= const 


q Fig. 7.1. Two types of 


displacement vector distributions 
that are unrelated to deformation: 
(a) translation and (b) rotation. 


translation 


This is why to develop an adequate quantitative characterization of deformation, so far for fixed 
t, we should start with finding suitable functions of the spatial distribution of displacements, q(r), that 
exist only due to deformations. One of such measures is the change of the distance d/ = ldr| between 
two close points: 
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(dl)’ ~(dl)’ 


3 3 
after deformation before deformation — >: (dr, + dq,)° ~ > (ar, y > (7.2) 
j=l j=l 
where dq; is the j'" Cartesian component of the difference dq between the displacements q of these close 
points. If the deformation is small in the sense | dq |<< dl, we may keep, in this expression, only the 
terms proportional to the first power of the infinitesimal vector dq: 


(dl)’ —(dl)’ 


after deformation 


3 3 
before deformation — sy [ 20r, dq, + (dq,)° |= 2)" dr,dq, ' (7.3) 
j=l 


j=l 


Since g; is a function of three independent scalar arguments 7;, its full differential (at fixed time) may be 
represented as 


dq, = Da or (7.4) 


The coefficients 6g,/Or;, may be considered as elements of a tensor providing a linear relation between 
the vectors dr and dq.! Plugging Eq. (4) into Eq. (2), we get 


3 oq , 
(dl)’ after deformation CAO) fects cactocaat =2 » a dr dry, : (7.5) 
SPAM 7! 


The convenience of the tensor 0g,/Or;' for characterizing deformations is that it automatically 
excludes the translation displacement (Fig. 1a), which is independent of 7;. Its drawback is that its 
particular elements are still affected by the rotation of the body — even though the sum (5) is not. Indeed, 
according to the vector product’s definition, Eq. (1) may be represented in Cartesian coordinates as 


rotation — (dor. = AQ jw Je yye ’ (7.6) 


dq; 


vector r, and taking into account that this partial differentiation (0) is independent of (and hence may be 
swapped with) the differentiation (d) over the common rotation angle ¢g, we get the amounts 


Pj r; 


4 =. 8 ra pee pi Wee 17 
A ea A Se fee ph = Sig A 83 (7.7) 
rotation rotation 


which may differ from 0. However, notice that the sum of these two differentials equals zero for any dg, 
which is possible only if? 


Og, Og, 
iA eeeeal HO ser p29" (7.8) 
or; or i rotation 


J 


This is why it is convenient to rewrite Eq. (5) in a mathematically equivalent form, 


! Since both dq and dr are legitimate physical vectors (whose Cartesian components are properly transformed as 
the transfer between reference frames), the 3x3 matrix with elements 0q,/Or;: is indeed a legitimate physical tensor 
— see the discussion in Sec. 4.2. 

2 As a result, the full sum (5), which includes three partial sums (8), is not affected by rotation — as we already 
know. 
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3 
2 = 
after deformation ~ (dl) |before deformation a 2 V9 pr, dr, p) (7.9a) 


ijl 


(dl)’ 


where sj are the elements of the so-called symmetrized strain tensor, defined as 


(7.9b) 


(Note that this modification does not affect the diagonal elements sj; = 0q;/Or;,). So, the advantage of the 
symmetrized tensor (9b) over the initial tensor with elements 0q,/Or; is that according to Eq. (8), at pure 
rotation, all elements of the symmetrized strain tensor vanish. 


Now let us discuss the physical meaning of this tensor. As was already mentioned in Sec. 4.2, 
any symmetric tensor may be diagonalized by an appropriate selection of the reference frame axes. In 
such principal axes, sj = s;6;, So that Eq. (4) takes a simple form: 


oq, 
dq, =— 
ay Or, 


dr, =s ,dr,. (7.10) 


We may use this expression to calculate the change of each side of an elementary cuboid 
(parallelepiped) with its sides dq; parallel to the principal axes: 


dr, —dr, 


after deformation before deformation = dq J =s ii dr Hie h 1 1) 


and of the cuboid’s volume dV = dr\drodr3: 


dV 


j=l j=l 


3 3 3 
after deformation dv before deformation — I] (dr, + Ss ,ar, ) ~ I] dr, = wT (1 a S ii )- i (7. 12) 
jal 


Since all our analysis is only valid in the linear approximation in small sj, Eq. (12) is reduced to 


dV —dV 


after deformation before deformation 


3 
~dV >'s,, = dV Tr(s), (7.13) 
j=l 
where Tr (trace)? of any matrix (in particular, any tensor) is the sum of its diagonal elements; in our 
current case 


Tr(s) = a (7.14) 


The tensor theory shows that the trace does not depend on the particular choice of the coordinate axes; 
so, the diagonal elements of the strain tensor characterize the medium’s compression/extension. 


Next, what is the meaning of its off-diagonal elements? It may be illustrated by the simplest 
example of a purely shear deformation shown in Fig. 2. (The geometry means to be uniform along the z- 
axis normal to the plane of the drawing.) In this case, all displacements (assumed small) have just one 
Cartesian component, in Fig. 2 along the x-axis: q = n,ayv (with @ << 1), so that the only nonzero 
element of the initial strain tensor 0q,/Or;' is Og,/Oy = a, and the symmetrized tensor (9b) is 


3 The traditional European notation for Tr is Sp (from the German Spur meaning “trace” or “track”). 
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0 a@/2 0 
s=|a/2 0 OJ. (7.15) 
0 0 0 


Evidently, the change of volume, given by Eq. (13), vanishes in this case. Thus, off-diagonal elements 
of the tensor s characterize shear deformations. 


Fig. 7.2. An example of pure shear. 


To conclude this section, let me note that Eq. (9) is only valid in Cartesian coordinates. For the 
solution of some important problems with the axial or spherical symmetry, it is frequently convenient to 
express six different elements of the symmetric strain tensor in either cylindrical or spherical 
coordinates via three components of the displacement vector q in the same coordinates. A 
straightforward differentiation of the definitions of these curvilinear coordinates, similar to that used to 
derive the well-known expressions for spatial derivatives of arbitrary functions,‘ yields, in particular, the 
following formulas for the diagonal elements of the tensor: 


(i) in the cylindrical coordinates: 


0 1 0. @) 
PP = EE: ) Soo aa qp i ? S., S meES (7.16) 
0p fa) Op Oz 
(ii) in the spherical coordinates: 
Oq 1 OF 4 1 cos0 1 04, 
S,=—, Soo =—| 4, += | Soo =—|9,+ —+— ; TNT 
a" CF er [4 00 op [de "40 ng sind 0p re 


These expressions, which will be used below for the solution of some problems for symmetrical 
geometries, may be a bit counter-intuitive. Indeed, Eq. (16) shows that even for a purely radial, axially- 
symmetric deformation, q = q(p)np, the angular element of the strain tensor does not vanish: sgg = q/p. 
(According to Eq. (17), in the spherical coordinates, both angular elements of the tensor exhibit the 
same property.) Note, however, that this relation describes a simple geometric fact: the change of the 
lateral distance pdy << p between two close points at the same distance from the symmetry axis, at a 
small change of p that keeps the angle dg between the directions towards these two points intact. 


7.2. Stress 


Now let us discuss the forces that cause the strain — or, from a legitimate alternative point of 
view, are caused by the strain. Internal forces acting inside (i.e. between arbitrarily defined parts of) a 


4 See, e.g., MA Eqs. (10.1)-(10.12). 
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continuum may be also characterized by a tensor. This stress tensor,» with elements oj, relates the 
Cartesian components of the vector dF of the force acting on an elementary area dA of an (in most 
cases, just imagined) interface between two parts of a continuum, to the components of the elementary 
vector dA = ndA normal to the area — see Fig. 3: 


(7.18) 


The usual sign convention here is to take the outer normal dn, i.e. to direct dA out of “our” part of the 
continuum, i.e. the part on which the calculated force dF is exerted — by the complementary part. 


dA =ndA 


interface 


“our” part . Fig. 7.3. The definition of vectors dA and dF. 
of the continuum 


In some cases, the stress tensor’s structure is very simple. For example, as will be discussed in 
detail in the next chapter, static and ideal fluids (i.e. liquids and gases) may only provide forces normal 
to any interface, and usually directed toward “our” part of the body, so that 


(7.19) 


where the scalar # (in most cases positive) is called pressure, and generally may depend on both the 
spatial position and time. This type of stress, with 7 > 0, is frequently called hydrostatic compression — 
even if it takes place in solids, as it may. 


However, in the general case, the stress tensor also has off-diagonal terms, which characterize 
the shear stress. For example, if the shear strain in Fig. 2 is caused by the shown pair of forces +F, they 
create internal forces F.n,, with F, > 0 if we speak about the force acting upon a part of the sample 
below the imaginary horizontal interface we are discussing. To avoid a horizontal acceleration of each 
horizontal slice of the sample, the forces should not depend on y, i.e. F. = const = F. Superficially, it 
may look that in this case, the only nonzero element of the stress tensor is dF,/dA, = F/A = const, so that 
tensor is asymmetric, in contrast to the strain tensor (15) of the same system. Note, however, that the 
displayed pair of forces +F creates not only the shear stress but also a nonzero rotating torque t = —Fhn, 
= -(dF,/dA,)Ahn, = —(dF,/dA,)Vn,, where V = Ah is the sample’s volume. So, if we want to perform a 
static stress experiment, i.e. avoid the sample’s rotation, we need to apply some other forces, e.g., a pair 
of vertical forces creating an equal and opposite torque t’ = (dF,\/dA,)Vn., implying that dF,/dA, = 
dF,/dA, = F/A. As a result, the stress tensor becomes symmetric, and similar in structure to the 
symmetrized strain tensor (15): 


5 It is frequently called the Cauchy stress tensor, partly to honor Augustin-Louis Cauchy who introduced this 
notion (and is responsible for the development, mostly in the 1820s, much of the theory described in this chapter), 
and partly to distinguish it from other possible definitions of the stress tensor, including the / and 2” Piola- 
Kirchhoff tensors. For the small deformations discussed in this course, all these notions coincide. 
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0 F/A 0 
c=|F/A 0 Ol. (7.20) 
0 0 0 


In many situations, the body may be stressed not only by forces applied to their surfaces but also 
by some volume-distributed (bulk) forces dF = fdV, whose certain effective bulk density f. (The most 
evident example of such forces is gravity. If its field is uniform as described by Eq. (1.16), then f = pg, 
where p is the mass density.) Let us derive the key formula describing the summation of the interface 
and bulk forces. For that, consider again an elementary cuboid with sides dr; parallel to the 
corresponding coordinate axes n; (Fig. 4) — now not necessarily the principal axes of the stress tensor. 


= G) 
- dA, 
ae 
N jv dF“ +d(dF“) _ 
ae Fig. 7.4. Deriving Eq. (23). 


If elements oj: of the tensor do not depend on position, the force dF”? acting on the j’-th face of 
the cuboid is exactly balanced by the equal and opposite force acting on its opposite face, because the 
vectors dA” at these faces are equal and opposite. However, if oj is a function of r, then the net force 
d(dF"») does not vanish. (In this expression, the first differential sign refers to the elementary shift dr;, 
while the second one, to the elementary area dA;.) Using the expression ojdA;’ for the j’ * contribution 
to the sum (18), in the first order in dr the j components of the vector d(dF””) is 


d dF = d( dA )= 00 i dr.dA.= 09 5 dV 7.21 
(dFi"") = d\o dA, )= ay oes a ae 2) 
i i 


where the cuboid’s volume dV = dr; dA; evidently does not depend on the index 7’. The addition of these 
force components for all three pairs of cuboid faces, i.e. the summation of Eqs. (21) over all three values 
of the upper index j’, yields the following relation for the /" Cartesian component of the net force 
exerted on the cuboid: 


3 ; 3 Oo.. 
d(dF,) =. d(dF) = Yar (7.22) 
i'=I j=l OT 


J J 


Since any volume may be broken into such infinitesimal cuboids, Eq. (22) shows that the space-varying 
stress is equivalent to a volume-distributed force dF er = ferdV, whose effective (not real!) bulk density fer 
has the following Cartesian components 


(7.23) 


so that in the presence of genuinely bulk forces dF = fdV, the densities fer and f just add up. This is the 
so-called Euler-Cauchy stress principle. 


Let us use this addition rule to spell out the 2"' Newton law for a unit volume of a continuum: 


Chapter 7 Page 6 of 38 


Euler-Cauchy 
principle 


Continuum 
dynamics: 
equation 


Work of 
stress 
forces 


Essential Graduate Physics CM: Classical Mechanics 


eo 
pa =f, +f. (7.24) 


Using Eq. (23), the ;" Cartesian component of Eq. (24) may be represented as 


(7.25) 


This is the key equation of the continuum’s dynamics (and statics), which will be repeatedly used below. 


For the solution of some problems, it is also convenient to have a general expression for the 
work 6 of the stress forces at a virtual deformation 6q — understood in the same variational sense as 
the virtual displacements or in Sec. 2.1. Using the Euler-Cauchy principle (23), for any volume V of a 
medium not affected by volume-distributed forces, we may write® 


SW = | oq d'r= -Y ff or =- ae a Bast (7.26) 
j=l Vy JJ=1V j 
Let us work out this integral by parts for a volume so large that the deformations 6q; on its surface are 
negligible. Then, swapping the operations of the variation and the spatial differentiation (just like it was 
done with the time differentiation in Sec. 2.1), we get 


0 
OW = 3 fo, S2a'r, (7.27) 
JT=1V Or, Pp 
Assuming that the tensor oj; is symmetric, we may rewrite this expression as 
1 Oq, oq. 
5W=—> || 0,5 4+0,8— la'r. (7.28) 
2 Fay ory Ory 
Now, swapping indices 7 and 7’ in the second expression, we finally get 
Di. nad 5 : 3 
ty | gee, io. es o, | r=-)) ano r, (7.29) 
2 j. J'=ly j ISAV 


where s;; are the elements of the strain tensor (9b). It is natural to rewrite this important formula as 


OW =[Sulr)d*r, — where Seo(r yo OS j (7.30) 
V 


jof'=l 


and interpret the locally-defined scalar function 6/(r) as the work of the stress forces per unit volume, at 
a small variation of the deformation. 


As a sanity check, for the pure pressure (19), Eq. (30) is reduced to the obviously correct result 
OW =—POV, where V is the volume of “our” part of the continuum. 


© Here the sign corresponds to the work of the “external” stress force dF exerted on “our” part of the continuum 
by its counterpart — see Fig. 3. Note that some texts make the opposite definition of 67, leading to its opposite 
sign. 
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7.3. Hooke’s law 


In order to form a complete system of equations describing the continuum’s dynamics, one needs 
to complement Eq. (25) with an appropriate constitutive equation describing the relation between the 
forces described by the stress tensor oj, and the deformations q described (in the small deformation 
limit) by the strain tensor sj’. This relation depends on the medium, and generally may be rather 
complicated. Even leaving alone various anisotropic solids (e.g., crystals) and macroscopically- 
inhomogeneous materials (like ceramics or sand), strain typically depends not only on the current value 
of stress (possibly in a nonlinear way) but also on the previous history of stress application. Indeed, if 
strain exceeds a certain plasticity threshold, atoms (or nanocrystals) may slip to their new positions and 
never come back even if the strain is reduced. As a result, deformations become irreversible — see Fig. 5. 


stress Oo elastic 
deformation 
(reversible) fracture 
point 


plastic 
deformation 
(irreversible) Fig. 7.5. A typical relation between the 


stress and strain in solids (schematically). 
strain s 


Only below the thresholds of nonlinearity and plasticity (which are typically close to each other), 
the strain is nearly proportional to stress, i.e. obeys the famous Hooke’s law.7 However, even in this 
elastic range, the law is not quite simple, and even for an isotropic medium is described not by one but 
by two constants, called the elastic moduli. The reason for that is that most elastic materials resist the 
strain accompanied by a volume change (say, the hydrostatic compression) differently from how they 
resist a shear deformation. 


To describe this difference, let us first represent the symmetrized strain tensor (9b) in the 
following mathematically equivalent form: 


Si = [s, 7 3o0T ) + E Oj, AT (| (7.31) 


According to Eq. (13), the traceless tensor in the first parentheses does not give any contribution to the 
volume change, e.g., may be used to characterize a purely shear deformation, while the second term 
describes the hydrostatic compression alone. Hence we may expect that the stress tensor may be 
represented (again, within the elastic deformation range only!) as 


Hooke’s 


(7.32) law via 


jand K 


where K and yw are constants. (The inclusion of coefficients 2 and 3 into Eq. (32) is justified by the 
simplicity of some of its corollaries — see, e.g., Eqs. (36) and (41) below.) Indeed, experiments show that 


7 Named after Robert Hooke (1635-1703), the polymath who was the first to describe the law in its simplest, 1D 
version. 
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Hooke’s law in this form is followed, at small strain, by all isotropic materials. In accordance with the 
above discussion, the constant (in some texts, denoted as G) is called the shear modulus, while the 
constant K (sometimes denoted B), the bulk modulus. The two left columns of Table 1 show the 
approximate values of these moduli for typical representatives of several major classes of materials.® 


Table 7.1. Elastic moduli, density, and sound velocities of a few representative materials (approximate values) 


Material K (GPa) | w(GPa) | E (GPa) Vv p(kg/m*) | vi(m/s) Vv; (m/s) 
Diamond” 600 450 1,100 0.20 3,500 1,830 1,200 
Hardened steel 170 75 200 0.30 7,800 5,870 3,180 
Water” 2.1 0 0 0.5 1,000 1,480 0 
Air 0.00010 0 0 0.5 1.2 332 0 


@) Averages over crystallographic directions (~10% anisotropy). 
>) At the so-called ambient conditions (T= 20°C, ? = 1 bar = 10° Pa). 


To better appreciate these values, let us first discuss the quantitative meaning of K and yw, using 
two simple examples of elastic deformation. However, in preparation for that, let us first solve the set of 
nine (or rather six different) linear equations (32) for s;. This is easy to do, due to the simple structure 
of these equations: they relate the elements oj and sj with the same indices, but the tensor’s trace 
effect. This slight complication may be readily overcome by noticing that according to Eq. (32), 


Tr (c) = y Oo; =3K Tr (s), 


j=l 


so that Tr (s) = rau (c). (733) 


Plugging this result into Eq. (32) and solving it for sj, we readily get the reciprocal relation, which may 
be represented in a similar form: 


1 1 Lf 
Sip = al — +Tr(0)6,, + sel gTos,,| 

Now let us apply Hooke’s law, in the form of Eqs. (32) or (34), to two simple situations in which 
the strain and stress tensors may be found without using the full differential equation of the elasticity 
theory and boundary conditions for them. (That will be the subject of the next section.) The first 
situation is the hydrostatic compression when the stress tensor is diagonal, and all its diagonal elements 
are equal — see Eq. (19).9 For this case, Eq. (34) yields 


(7.34) 


P 


S ig ae Ci (7.35) 


8 Since the strain tensor elements, defined by Eq. (9), are dimensionless, while the strain, defined by Eq. (18), has 
the dimensionality similar to pressure (of force per unit area), so do the elastic moduli K and yw. 

° It may be proved that such a situation may be implemented not only in a fluid with pressure ? but also in a solid 
sample of an arbitrary shape, for example by placing it into a compressed fluid. 
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i.e. regardless of the shear modulus, the strain tensor is also diagonal, with all diagonal elements equal. 
According to Eqs. (11) and (13), this means that all linear dimensions of the body are reduced by a 
similar factor, so that its shape is preserved, while the volume is reduced by 


3 
AV _ Ys,=-5: (7.36) 


This formula clearly shows the physical sense of the bulk modulus K as the reciprocal 
compressibility. As Table 1 shows, the values of K may be dramatically different for various materials, 
and even for such “soft stuff’ as water, this modulus is actually rather high. For example, even at the 
bottom of the deepest, 10-km ocean well (P ~ 10° bar = 0.1 GPa), the water’s density increases by just 
about 5%. As a result, in most human-scale experiments, water may be treated as an incompressible 
fluid — the approximation that will be widely used in the next chapter. Many solids are even much less 
compressible — see, for example, the first two rows of Table 1. 


Quite naturally, the most compressible media are gases. For a portion of gas, a certain 
background pressure ? is necessary just for containing it within its volume JV, so that Eq. (36) is only 
valid for small increments of pressure, AP: 


ae (7.37) 


Moreover, the compression of gases also depends on thermodynamic conditions. (In contrast, for most 
condensed media, the temperature effects are very small.) For example, at ambient conditions, most 
gases are reasonably well described by the equation of state called the ideal classical gas: 


_ Nk, 


PV =Nk,T, ie. P (7.38) 


where N is the number of molecules in volume V, and kg ~ 1.38x107? J/K is the Boltzmann constant. !° 
For a small volume change AV at a constant temperature 7, this equation gives 


AP 


T=const ~ : 
P 


i P 
Nkp Ay (7.39) 


RP sig 2 -AV=-—AV, ie. — 
V V 

Comparing this expression with Eq. (36), we get a remarkably simple result for the isothermal 

compression of gases, 


(7.40) 


K| T=const — > 
which means in particular that the bulk modulus listed in Table 1 is actually valid, at the ambient 
conditions, for almost any gas. Note, however, that the change of thermodynamic conditions (say, from 
isothermal to adiabatic!') may affect the compressibility of the gas.. 


Now let us consider the second, rather different, fundamental experiment: a purely shear 
deformation shown in Fig. 2. Since the traces of the matrices (15) and (20), which describe this 
situation, are equal to 0, for their off-diagonal elements, Eq. (32) gives merely oj = 24us;, so that the 
deformation angle @ (see Fig. 2) is just 


10 For the derivation and a detailed discussion of Eq. (37), see, e.g., SM Sec. 3.1. 
!1 See, e.g., SM Sec. 1.3. 
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a=——. (7.41) 


Note that the angle does not depend on the thickness / of the sample, though of course the maximal 
linear deformation g, = ah is proportional to the thickness. Naturally, as Table 1 shows, u = 0 for all 
fluids because they do not resist static shear stress. 


However, not all situations, even apparently simple ones, involve just either K or wu. Let us 
consider stretching a long and thin elastic rod of a uniform cross-section of area A — the so-called tensile 
stress experiment shown in Fig. 6.!2 


5 Fig. 7.6. The tensile stress experiment. 


Though the deformation of the rod near its clamped ends depends on the exact way forces F are 
applied (we will discuss this issue later on), we may expect that over most of its length the tension 
forces are directed virtually along the rod, dF = F.n,, and hence, with the coordinate choice shown in 
Fig. 6, Oj = Gj = 0 for all 7, including the diagonal elements o;, and o,. Moreover, due to the open 
lateral surfaces, on which, evidently, dF’. = dF, = 0, there cannot be an internal stress force of any 
direction, acting on any elementary internal boundary parallel to these surfaces. This means that o;, = 
O:y = 0. So, of all elements of the stress tensor only one, o;,, is not equal to zero, and for a uniform 
sample, o;, = const = F/A. For this case, Eq. (34) shows that the strain tensor is also diagonal, but with 
different diagonal elements: 


s.=[—+— |e. (7.42) 
; 9K 3u 
Sy =S, = ae a (7.43) 
7 r 9K 6u) © 


Since the tensile stress is most common in engineering practice (including physical experiment 
design), both combinations of the elastic moduli participating in these two relations have earned their 
own names. In particular, the constant in Eq. (42) is usually denoted as 1/E (but in many texts, as 1/Y), 
where E is called Young ’s modulus:'3 


(7.44) 


12 Though the analysis of compression in this situation gives similar results, in practical experiments a strong 
compression of a long sample may lead to the loss of the horizontal stability — the so-called buckling — of the rod. 
13 Named after another polymath, Thomas Young (1773-1829) — somewhat unfairly, because his work on 
elasticity was predated by a theoretical analysis by L. Euler in 1727 and detailed experiments by Giordano Riccati 
in 1782. 
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As Fig. 6 shows, in the tensile stress geometry s,, = 0q-/0z = Al/l, so that Young’s modulus scales the 
linear relation between the relative extension of the rod and the force applied per unit area:!4 


ae, (7.45) 


The third column of Table 1 above shows the values of this modulus for two well-known solids: 
diamond (with the highest known value of EF of all bulk materials!5) and the steels (solid solutions of 
~10% of carbon in iron) used in construction. Again, for all fluids, Young’s modulus equals zero — as it 
follows from Eq. (44) for w= 0. 


I am confident that most readers of these notes have been familiar with Eq. (42), in the form of 
Eq. (45), from their undergraduate studies. However, this can hardly be said about its counterpart, Eq. 
(43), which shows that at the tensile stress, the rod’s cross-section dimensions also change. This effect is 
usually characterized by the following dimensionless Poisson’s ratio:'© 


1 1 1 z 1 _13K-2u (7.46) 
9K 6u 9K 3u) 2 3K+yu 

According to this formula, for realistic materials with K > 0, > 0, vmay vary from (-1) to (+'4), 
but for the vast majority of materials,!’ its values are between 0 and 4 — see the corresponding column 
of Table 1. The lower limit of this range is reached in porous materials like cork, whose Jateral 
dimensions almost do not change at the tensile stress. Some soft materials such as natural and synthetic 

rubbers present the opposite case: v= %4.!8 Since according to Eqs. (13) and (42), the volume change is 

AV 1F Al 

—=s +8, +5 = 1-2v)=(1-2v : TAT 
V xx yy zz E A ( ) ( ) ] ( ) 


such materials virtually do not change their volume at the tensile stress. The ultimate limit of this trend, 
AV/V = 0, is provided by fluids and gases, because, as it follows from Eq. (46) with w = 0, their 
Poisson’s ratio v is exactly 2. However, for most practicable construction materials such as various 
steels (see Table 1) the relative volume change (47) is as high as ~40% of that of the length. 


Due to the tensile stress dominance in practice, the coefficients E and v are frequently used as a 
pair of independent elastic moduli, instead of K and wz. Solving Eqs. (44) and (46) for them, we get 
E 
K = = ’ fad a . 
3(1- 2v) 2(.+Vv) 


(7.48) 


14 According to Eq. (47), E may be thought of as the force (per unit area) that would double the initial sample’s 
length, if only Hooke’s law was valid for deformations that large — as it typically isn’t. 

'5 E is probably somewhat higher (up to 2,000 GPa) in such nanostructures as carbon nanotubes and monatomic 
sheets (graphene), though there is still substantial uncertainty in experimentally measured elastic moduli of these 
structures — for a review see, e.g., G. Dimitrios et al., Prog. Mater. Sci. 90, 75 (2017). 

16 In some older texts, the Poisson’s ratio is denoted o; but its notation as v dominates modern literature. 

'7 The only known exceptions are certain exotic solids with very specific internal microstructure — see, e.g., R. 
Lakes, Science 235, 1038 (1987) and references therein. 

18 For example, silicone rubbers (synthetic polymers broadly used in engineering and physics experiment design) 
have, depending on their particular composition, synthesis, and thermal curing, v = 0.47+0.49, and as a result 
combine respectable bulk moduli K = (1.5+2) GPa with very low Young’s moduli: E = (0.0001+0.05) GPa. 
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Using these formulas, the two (equivalent) formulations of Hooke’s law, expressed by Eqs. (32) and 
(34), may be rewritten as 


(7.49a) 


(7.49b) 


The linear relation between the strain and stress tensor in elastic continua enables one more step 
in our calculation of the potential energy U due to deformation, which was started at the end of Sec. 2. 
Indeed, to each infinitesimal part of this strain increase, we may apply Eq. (30), with the elementary 
work 57 of the surface forces increasing the potential energy of “our” part of the body by the equal 
amount dU. Let us slowly increase the deformation from a completely unstrained state (in which we 
may take U = 0) to a certain strained state, in the absence of bulk forces f, keeping the deformation type, 
i.e. the relation between the elements of the stress tensor, intact. In this case, all elements of the tensor 
oj’ are proportional to the same single parameter characterizing the stress (say, the total applied force), 
and according to Hooke’s law, all elements of the tensor s;; are proportional to that parameter as well. In 
this case, integration of Eq. (30) through the variation yields the following final value:!9 


(7.50) 


Evidently, this u(r) may be interpreted as the volumic density of the potential energy of the 
elastic deformation. 


7.4. Equilibrium 
Now we are fully equipped to discuss the elastic deformation dynamics, but let us start with 
statics. The static (equilibrium) state may be described by requiring the right-hand side of Eq. (25) to 
vanish. To find the elastic deformation, we need to plug oj from Hooke’s law (49a), and then express 
the elements s,; via the displacement distribution — see Eq. (9). For a uniform material, the result is?° 


2 


meg e 3, O°"q. 
ee a eee ae) (7.51) 
2+v) faq Or, 2(1+v)1l-2v)Fq err, ~? 


Taking into account that the first sum is just the /‘" component of V°q, while the second sum is the /” 
component of V(V-q), we see that all three equations (51) for three Cartesian components (j = 1, 2, and 
3) of the deformation vector q, may be conveniently merged into one vector equation 


V(V-q)+f =0. (7.52) 


E 
——__ q - —_—_ 
(1+) 2(1+v)(1— 2v) 


'9 To give additional clarity to the arising factor 4, let me spell out this integration for the simple case of a 1D 
spring. In this case, Eq. (30) 1s reduced to dU = 5 = Fox, and if the spring’s force is elastic, F = «x, the 
integration over x from 0 to its final value yields U= xx’/2 = Fx/2. 

20 As it follows from Eqs. (48), the coefficient before the first sum in Eq. (51) is just the shear modulus yz, while 
that before the second sum is equal to (K + 1/3). 
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For some applications, it is more convenient to recast this equation into a different form, using the well- 
known vector identity?! V’q = V(V-q) — Vx(Vxq). The result is 


E(1-v) kes 
(l+v)(1—2v) ea) 21+ v) 


Vx(Vxq)+f =0. (7.53) 


It is interesting that in problems without volume-distributed forces (f = 0), Young’s modulus E 
cancels out. Even more fascinating, in this case, the equation may be re-written in a form not involving 
the Poisson’s ratio v_ either. Indeed, calculating the divergence of the remaining terms of Eq. (53), 
taking into account MA Eqs. (9.2) and (11.2), we get a surprisingly simple equation 


V>(V-q) =0. (7.54) 


A natural question here is how the elastic moduli affect the deformation distribution if they do 
not participate in the differential equation describing it. The answer is different in the following two 
cases. If what is fixed at the body’s boundary are deformations, then the moduli are irrelevant, because 
the deformation distribution through the body does not depend on them. On the other hand, if the 
boundary conditions describe fixed stress (or a combination of stress and strain), then the elastic 
constants creep into the solution via the recalculation of these conditions into the strain. As a simple but 
representative example, let us calculate the deformation distribution in a (generally, thick) spherical 
shell under the effect of pressures inside and outside it — see Fig. 7a. 


(b) 


t 
La 


P 


Fig. 7.7. The spherical shell 
problem: (a) the general case, and 
(b) the thin shell limit. 


Due to the spherical symmetry of the problem, the deformation is obviously spherically- 
symmetric and radial, q(r) = g(r)n,, 1.e. is completely described by one scalar function g(r). Since the 
curl of such a radial vector field is zero,?2 Eq. (53) is reduced to 


V(V -q) =9, (7.55) 


This means that the divergence of the function g(r) is constant within the shell. In the spherical 


coordinates:?3 
Aes 
——|r°q)=const. (7.56) 
r° dr ( a) 
Naming this constant 3a (with the numerical factor chosen just for the later notation’s convenience), and 
integrating Eq. (56) over 7, we get its solution, 


21 See, e.g., MA Eq. (11.3). 
22 Tf this is not immediately evident, please have a look at MA Eq. (10.11) with f =f,(y)n,. 
23 See, e.g., MA Eq. (10.10) with f = g(r)n, 


Chapter 7 Page 14 of 38 


Essential Graduate Physics CM: Classical Mechanics 


q(r) =ar+—., (7.57) 
r 


which also includes another integration constant, b. The constants a and b may be determined from the 
boundary conditions. Indeed, according to Eq. (19), 


ee at r=R,, 
0, = 


7.58 
—P,, at r=R,. ven) 


In order to relate this stress to strain, let us use Hooke’s law, but for that, we first need to calculate the 
strain tensor components for the deformation distribution (57). Using Eqs. (17), we get 


_ 4 _ 52” see, a ieee (7.59) 


= Ss = 
rr 3°? 00 2p 3 
or r 


so that Tr (s) = 3a. Plugging these relations into Eq. (49a) for o;,, we obtain 


Sey jee raed at (7.60) 
l+v r 1-2v 


Now plugging this relation into Eqs. (58), we get a system of two linear equations for the coefficients a 
and b. An easy solution to this system yields 


nly PR PR pa lty(AaP)RR 


7.61 
E R-R 2E R>-R; ee 


Formulas (57) and (61) give a complete solution to our problem. (Note that the elastic moduli are 
back, as was promised.) This solution is rich in physical content and deserves at least some analysis. 
First of all, note that according to Eq. (48), the coefficient (1 — 2 )/E in the expression for a is just 1/3K, 
so that the first term in Eq. (57) for the net deformation describes the hydrostatic compression. Now 
note that the second of Eqs. (61) yields b = 0 if R; = 0. Thus for a solid sphere, we have only the 
hydrostatic compression that was discussed in the previous section. Perhaps less intuitively, making two 
pressures equal also gives b = 0, i.e. the purely hydrostatic compression, for arbitrary R2 > Rj. 


However, in the general case, b # 0, so that the second term in the deformation distribution (57), 
which describes the shear deformation,”* is also substantial. In particular, let us consider the important 
thin-shell limit, when Ry — Rj =t << R12 = R- see Fig. 7b. In this case, g(R1) = g(R2) is just the change 
of the shell radius R, for which Eqs. (57) and (61) (with R.° — Ry° = 3R7P) give 


b (2 -P)R’(1-Ww 1l+v Rey 
AR=gq(R)xaR+ ww! 2 + =(P —P, )—_—_. 7.62 
q(R) a x aT (A, -P,) - (7.62) 


Naively, one could think that at least in this limit the problem could be analyzed by elementary 
means. For example, the total force exerted by the pressure difference (A, _ 7) on the diametrical cross- 


section of the shell (see, e.g., the dashed line in Fig. 7b) is F = zR°(P, - ®), giving the stress, 


24 Indeed, according to Eq. (48), the material-dependent factor in the second of Eqs. (61) is just 1/4. 
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2 
pee ieee 2: (A=) _(p 2 


, 7.63 
A 2nRt SP Og eo) 


directed along the shell’s walls. One can check that this simple formula may be indeed obtained, in this 
limit, from the strict expressions for ogg and Ogg, following from the general treatment carried out 
above. However, if we now tried to continue this approach by using the simple relation (45) to find the 
small change Rs,, of the sphere’s radius, we would arrive at a result with the general structure of Eq. 
(62), but without the factor (1 — v) < 1 in the numerator. The reason for this error (which may be as 
significant as ~30% for typical construction materials — see Table 1) is that Eq. (45), while being valid 
for thin rods of arbitrary cross-section, is invalid for thin but broad sheets, and in particular the thin 
shell in our problem. Indeed, while at the tensile stress both lateral dimensions of a thin rod may 
contract freely, in our last problem all dimensions of the shell are under stress — actually, under much 
more tangential stress than the radial one.5 


7.5. Rod bending 


The general approach to the static deformation analysis, outlined at the beginning of the previous 
section, may be simplified not only for symmetric geometries but also for the uniform thin structures 
such as thin plates (also called “membranes” or “thin sheets’) and thin rods. Due to the shortage of time, 
in this course, I will demonstrate typical approaches to such systems only on the example of thin rods. 
(The theory of thin plates and shells is conceptually similar but mathematically more involved.?°) 


Besides the tensile stress analyzed in Sec. 3, the two other major types of rod deformation are 
bending and torsion. Let us start from a “local” analysis of bending caused by a pair of equal and 
opposite external torques t = +n,z, perpendicular to the rod axis z (Fig. 8), assuming that the rod is 
“quasi-uniform”, i.e. that on the scale of this analysis (comparable with the linear scale a of the cross- 
section) its material parameters and the cross-section A do not change substantially. 


(b) 


Fig. 7.8. Rod bending, in a local reference frame (specific for each cross-section). The bold arrows show 
the simplest way to create the two opposite torques z,: a couple of opposite forces for each torque. 


Just as in the tensile stress experiment (Fig. 6), the components of the stress forces dF, normal to 
the rod’s length, have to equal zero on the surface of the rod. Repeating the arguments made for the 
tensile stress discussion, we have to conclude that only one diagonal element of the tensor (in Fig. 8, o;) 
may differ from zero: 


25 Strictly speaking, this is only true if the pressure difference is not too small, namely, if |A, — A] >> A,2t/R. 


26 For its review see, e.g., Secs. 11-15 in L. Landau and E. Lifshitz, Theory of Elasticity, 3"! ed., Butterworth- 
Heinemann, 1986. 
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Cie >020.: (7.64) 


However, in contrast to the tensile stress, at pure static bending, the net force directed along the rod has 
to vanish: 


F,=(|o,d’r=0, (7.65) 
S 


where S is the rod’s cross-section, so that o;, has to change its sign at some point of the x-axis, selected 
to lie in the plane of the bent rod. Thus, the bending deformation may be viewed as a combination of a 
stretch of some layers of the rod (bottom layers in Fig. 8) with compression of other (top) layers. 


Since it is hard to make more conclusions about the stress distribution immediately, let us turn 
over to strain, assuming that the rod’s cross-section is virtually constant over the length of our local 
analysis. From the above representation of bending as a combination of stretching and compression, it is 
evident that the longitudinal deformation q, has to vanish along some neutral line on the rod’s cross- 
section — in Fig. 8, represented by the dashed line.?’ Selecting the origin of the x-coordinate on this line, 
and expanding the relative deformation in the Taylor series in x, due to the cross-section smallness we 
may keep just the first, linear term of the expansion: 

44. __ x 


See ey 7.66 
S ZZ dz R ( ) 


The constant R has the sense of the curvature radius of the bent rod. Indeed, on a small segment dz, the 
cross-section turns by a small angle dg, = —dgq-/x (Fig. 8b). Using Eq. (66), we get dg, = dz/R, which is 
the usual definition of the curvature radius R in the differential geometry, for our special choice of the 
coordinate axes.28 


Expressions for other elements of the strain tensor are harder to guess (like at the tensile stress, 
not all of them are equal to zero!), but what we already know about o;, and s-,, is already sufficient to 
start formal calculations. Indeed, plugging Eq. (64) into Hooke’s law in the form (49b), and comparing 
the result for s,, with Eq. (66), we find 

x 
O,, =-E—. 7.67 
2=-E (7.67) 
From the same Eq. (49b), we could also find the transverse elements of the strain tensor, and conclude 
that they are related to s-, exactly as at the tensile stress: 


S =f =—-VS_, (7.68) 


and then, integrating these relations along the cross-section of the rod, find the deformation of the cross- 
section’s shape. More important for us, however, is to calculate the relation between the rod’s curvature 
and the net torque acting on a given cross-section S (taking dA, > 0): 

E EI 
t, = [(rxdF), =-[xo,d?r=2 [x’d’ra—t, (7.69) 
Ss S 


Ss 


27 Strictly speaking, that dashed line is the intersection of the neutral surface (the continuous set of such neutral 
lines for all cross-sections of the rod) with the plane of the drawing. 

28 Indeed, for (dx/dz) << 1, the general formula MA Eq. (4.3) for the curvature (with the appropriate 
replacements f—> x and x z) is reduced to 1/R = d’x/dz* = d(dx/dz)/dz = d(tang,)/dz = do,/dz. 
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where J, is a geometric constant defined as 
I,= [x?dxdy. (7.70) 
S 
Note that this factor, defining the bending rigidity of the rod, grows as fast as a’ with the linear scale a 
of the cross-section.”9 


In these expressions, x has to be referred to the neutral line. Let us see where exactly this line 
passes through the rod’s cross-section. Plugging the result (67) into Eq. (65), we get the condition 
defining the neutral line: 

[ xdxdy = 0. (7.71) 


S 


This condition allows for a simple interpretation. Imagine a thin sheet of some material, with a constant 
mass density o per unit area, cut in the form of the rod’s cross-section. If we place a reference frame 
into its center of mass, then, by its definition, 


o[rdxdy = 0. (7.72) 
S 


Comparing this condition with Eq. (71), we see that one of the neutral lines has to pass through the 
center of mass of the sheet, which may be called the “center of mass of the cross-section”. Using the 
same analogy, we see that the integral J, given by Eq. (72) may be interpreted as the moment of inertia 
of the same imaginary sheet of material, with o formally equal to 1, for its rotation about the neutral line 
— cf. Eq. (4.24). This analogy is so convenient that the integral is usually called the moment of inertia of 
the cross-section and denoted similarly — just as has been done above. So, our basic result (69) may be 
rewritten as 


(7.73) 


This relation is only valid if the deformation is small in the sense R >> a. Still, since the 
deviations of the rod from its unstrained shape may accumulate along its length, Eq. (73) may be used 
for calculations of large “global” deviations of the rod from equilibrium, on a length scale much larger 
than a. To describe such deformations, Eq. (73) has to be complemented by conditions of the balance of 
the bending forces and torques. Unfortunately, a general analysis of such deformations requires a bit 
more differential geometry than I have time for, so I will only discuss this procedure for the simplest 
case of relatively small transverse deviations g = q, of an initially horizontal rod from its straight shape 
that will be used for the z-axis (Fig. 9a), by some forces, possibly including bulk-distributed forces f = 
n,f,(z). (Again, the simplest example is a uniform gravity field, for which f. = —og = const.) Note that in 
the forthcoming discussion the reference frame will be global, i.e. common for the whole rod, rather 
than local (pertaining to each cross-section) as it was in the previous analysis — cf. Fig. 8. 


First of all, we may write a static relation for the total vertical force F = n,F‘,(z) exerted on the 
part of the rod to the left of the considered cross-section — located at point z. The differential form of this 
relation expresses the balance of vertical forces exerted on a small fragment dz of the rod (Fig. 9a), 
necessary for the absence of its /inear acceleration: F(z + dz) — F(z) + f(z)Adz = 0, giving 


29 Tn particular, this is the reason why the usual electric wires are made not of a solid copper core, but rather a 
twisted set of thinner sub-wires, which may slip relative to each other, increasing the wire flexibility. 
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Fig. 7.9. A global picture of rod bending: (a) the forces acting on a small fragment of a rod, and (b) two 
bending problem examples, each with two typical but different boundary conditions. 


ee =-f_A, (7.74) 
dz 
where A is the cross-section’s area. Note that this vertical component of the internal forces has been 
neglected in our derivation of Eq. (73), and hence our final results will be valid only if the ratio F,/A is 
much smaller than the magnitude of o;, described by Eq. (67). However, in reality, these are exactly the 
forces that create the very torque t = n,7z, that in turn causes the bending, and thus have to be taken into 
account in the analysis of the global picture. 


Such an account may be made by writing the balance of the components of the elementary 
torque exerted on the same rod fragment of length dz, necessary for the absence of its angular 
acceleration: dt, + F’,dz = 0, so that 


kas ete oe (7.75) 


These two equations should be complemented by two geometric relations. The first of them is 
dg,/dz = 1/R, which has already been discussed above. We may immediately combine it with the basic 
result (73) of our local analysis, getting: 


d T 
eee (7.76) 
dz EI, 
The final equation is the geometric relation evident from Fig. 9a: 
dq 
—=9,, Lil 
ae (7.77) 


which is (as all expressions of our simple analysis) only valid for small bending angles, | Dy | <<1., 


The four differential equations (74)-(77) are sufficient for the full solution of the weak-bending 
problem, if complemented by appropriate boundary conditions. Figure 9b shows the conditions most 
frequently met in practice. Let us solve, for example, the problem shown on the top panel of Fig. 9b: 
bending of a rod, “clamped” at one end (say, immersed into a rigid wall), under its own weight. As 
should be clear from their derivation, Eqs. (74)-(77) are valid for any distribution of parameters A, E, [,, 
and p over the rod’s length, provided that the rod is quasi-uniform, i.e. its parameters’ changes are so 
slow that the local relation (76) is still valid at any point. However, just for simplicity, let us consider a 
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uniform rod. The simple structure of Eqs. (74)-(77) allows for their integration one by one, each time 
using the appropriate boundary conditions. To start, Eq. (74) with f, =—pg = const yields 


F. = pgAz + const = pzd(z _ 1), (7.78) 


where the integration constant has been selected to satisfy the right-end boundary condition: F, = 0 at z 
= /. As a sanity check, at the left wall (z = 0), F, = —pgAl = —mg, meaning that the whole weight of the 
rod is exerted on the supporting wall — fine. 


Next, by plugging Eq. (78) into Eq. (75) and integrating, we get 
y 


C= ~ PA? —2Iz)+ const = ~ PEA —I+1?)= ~ P24 (¢-1), (7.79) 


where the integration constant’s choice ensures the second right-boundary condition: z,= 0 at z= /—see 
Fig. 9b again. Now proceeding in the same fashion to Eq. (76), we get 


pgd (z-l)° pga Bs. gs 
= + const = - =I F'|, 7.80 
OE Bee ee a=] vy 


4 
where the integration constant is selected to satisfy the clamping condition at the left end of the rod: 9, 
= 0 at z= 0. (Note that this is different from the support condition illustrated on the lower panel of Fig. 
9b, which allows the angle g, to be different from zero at z = 0, but requires the torque to vanish at that 
point.) Finally, integrating Eq. (77) with @, given by Eq. (80), we get the rod’s global deformation law, 


_ 74 4 
+1°z+const se 22) pe : (7.81) 
6El, 4 4 


ped | (z-1" 
6EI,| 4 


q,(z) =—- 


where the integration constant is selected to satisfy the second left-boundary condition: g = 0 at z = 0. 
So, the bending law is sort of complicated even in this very simple problem. It is also remarkable how 
fast does the end’s displacement grow with the increase of the rod’s length: 


pgAl* 
q, (1) SEI, : (7.82) 

To conclude this solution, let us discuss the validity of this result. First, the geometric relation 
(77) is only valid if |g, (D| << 1, and hence if | q(J) | << /. Next, the local formula Eq. (76) is valid if 1/R 
= (D/EI, << l/la~ ya Using the results (79) and (82), we see that the latter condition is equivalent to 
| g(0) | << P/a, ie. is weaker than the former one, because all our analysis has been based on the 
assumption / >> a. Another point of concern may be that the off-diagonal stress element o,, ~ F,/A, 
which is created by the vertical gravity forces, has been ignored in our local analysis. For that 
approximation to be valid, this element must be much smaller than the diagonal element o;, ~ aE/R = 
a7/I, taken into account in that analysis. Using Eqs. (78) and (80), we are getting the following 
estimates: oj, ~ pgl, o:; ~ apgAl’/I, ~ a’ pgl’/I, According to its definition (70), J, may be crudely 
estimated as a’, so that we finally get the simple condition a << /, which has been assumed from the 
very beginning of our solution. 
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7.6. Rod torsion 


One more class of analytically solvable elasticity problems is the torsion of quasi-uniform, 
straight rods by a couple of axially-oriented torques t = +mn_z, — see Fig. 10. 


Fig. 7.10. Rod torsion. Just as 
in Fig. 8, the couples of forces 
F are just vivid representations 
of the opposite torques +t. 


This problem is simpler than the bending in the sense that due to its longitudinal uniformity, 
dq./dz = const, it is sufficient to relate the torque 7, to the so-called torsion parameter 


ceo (7.83) 


If the deformation is elastic and small (in the sense «a << 1, where a is again the characteristic size of 
the rod’s cross-section), «is proportional to z;. Hence our task is to calculate their ratio, 
Torsional 


rigidity: 
definition 


(7.84) 


called the torsional rigidity of the rod. 


As the first guess (as we will see below, of a limited validity), one may assume that the torsion 
does not change either the shape or size of the rod’s cross-sections, but leads just to their mutual rotation 
about a certain central line. Using a reference frame with the origin on that line, this assumption 
immediately enables the calculation of Cartesian components of the displacement vector dq, by using 
Eq. (6) with do = ndg.:: 


dq, =—ydo, =—Kydz, dq, = xd, = kxdz, dq, = 9. (7.85) 
From here, we can calculate all Cartesian elements (9) of the symmetrized strain tensor: 


kK kK 


Sy =Sy=S,, =0, 5, =5, =0, 5, =5, = a Sy, =Sy = re (7.86) 


The first of these equalities means that the elementary volume does not change, i.e. we are dealing with 
purely shear deformation. As a result, all nonzero elements of the stress tensor, calculated from Eqs. 
(32), are proportional to the shear modulus alone:?° 


O,,=0,, =0,, =), 0, =0,, =9, O,, =O, =-UN, O,, =O, = LK. (7.87) 


30 Note that for this problem, with a purely shear deformation, using the alternative elastic moduli E and v would 
be rather unnatural. If needed, we may always use the second of Eqs. (48): w= E/2(1 + v). 
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Now it is straightforward to use this result to calculate the full torque as an integral over the 
cross-section’s area A: 


r= | (xd), = | (xdF, —ydF,) = | (xo, ~ yo, Medy. (7.88) 


Using Eq. (87), we get % = wx,, i.e. 
(7.89) 


Again, just as in the case of thin rod bending, we have got an integral, in this case /., similar to a 
moment of inertia, this time for the rotation about the z-axis passing through a certain point of the cross- 
section. For any axially-symmetric cross-section, this has to be its central point. Then, for example, for 
the practically important case of a uniform round pipe with internal radius R, and external radius Ro, 
Eq. (89) yields 


1 
C= y 2x] p'dp = lr: eine), (7.90) 
In particular, for the solid rod of radius R (which may be treated as a pipe with R; = 0 and R2 = R), this 


result gives the following torsional rigidity 


C=—uR’, (7.91a) 


while for a hollow pipe of small thickness t << R, Eq. (90) is reduced to 
C=27uR’t. (7.91b) 
Note that per unit cross-section area A (and hence per unit mass of the rod) the thin pipe’s rigidity is 
twice higher than that of a solid rod: 
Cc 


A 


Do 


solid round rod — a (7.92) 


thin round pipe — UR > A 


This fact is one reason for the broad use of thin pipes in engineering and physical experiment design. 


However, for rods with axially-asymmetric cross-sections, Eq. (89) gives wrong results. For 
example, for a narrow rectangle of area A = wxt with ¢ << w, it yields the expression C = ptw’/12 
[WRONG!], which is even functionally different from the correct result — cf. Eq. (104) below. The 
reason for this error is that the above analysis does not describe possible bending g(x, v) of the rod’s 
cross-section in the direction along the rod. (For axially-symmetric rods, such bending is evidently 
forbidden by the symmetry, so that Eq. (89) is valid, and the results (90)-(92) are absolutely correct.) 


Let us describe?! this counter-intuitive effect by taking 


gq. =ky(x, y), (7.93) 


31 | would not be terribly shocked if the reader skipped the balance of this section at the first reading. Though the 
calculation described in it is very elegant, instructive, and typical for the theory of elasticity (and for good physics 
as a whole!), its results will not be used in other chapters of this course or other parts of this series. 
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(where w is some function to be determined), but still keeping Eq. (87) for two other components of the 
displacement vector. The addition of y does not perturb the equality to zero of the diagonal elements of 
the strain tensor, as well as of s,,, and s,,, but contributes to other off-diagonal elements: 


Si. =S. -5(-y+ 4) Sy, =S,, == pee ’ (7.94) 
2 Ox ‘ oe oy 
and hence to the corresponding elements of the stress tensor: 
O,, =0,= p(y 2) O,, =O, =UK ge (7.95) 
ox oy 


Now let us find the requirement imposed on the function y(x,y) by the fact that the stress force 
component parallel to the rod’s axis, 


dA, 
dF, =0,,dA,+0.,dA, = yxlA [ y+ | TAs) up OW. (7.96) 
° ox ) dA oy ) dA 


has to vanish at the rod’s surface(s), i.e. at a cross-section’s border. The coordinates {x, y} of any point 
at the border may be considered as unique functions, x(/) and (J), of the arc / of that line — see Fig. 11. 


dA, ---gdA 


Fig. 7.11. Deriving Eq. (99). 


As this sketch shows, the elementary area ratios participating in Eq. (96) may be readily 
expressed via the derivatives of these functions: dA,/dA = sin a= dy/dl , dA,/dA = cos a= —dx/dl, so that 


we may write 
(-»+ ov), se 2t |. 4 = 0. (7.97) 
ox )\ dl Oy Al} | ac 


Introducing, instead of y, a new function 7(x,y), defined by its derivatives as 


Ox 2 dy J Oy; 2 ax J 
we may rewrite Eq. (97) as 
of oe +h aid ie =2— aX border = 0, (7.99) 
By dl” ax dl dl 


so that the function y has to be constant at each border of the cross-section. 


Chapter 7 Page 23 of 38 


Essential Graduate Physics CM: Classical Mechanics 


In particular, for a singly-connected cross-section, limited to just one continuous border line (as 
in Fig. 11), this constant is arbitrary, because according to Eqs. (98), its choice does not affect the 
longitudinal deformation function y(x,y) and hence the deformation as a whole. Now let us use the 
definition (98) of yv (x, y) to calculate the 2D Laplace operator of this function: 


2 2 
Vv? poe ag ey a, (7.100) 
ee Ox Oy 20x oy 2 Oy 


This is a 2D Poisson equation (frequently met, for example, in electrostatics), but with a very simple, 
constant right-hand side. Plugging Eqs. (98) into Eqs. (95), and those into Eq. (88), we may express the 
torque 7,, and hence the torsional rigidity C, via the same function: 


(7.101a) 


Sometimes, it is easier to use this result in either of its two different forms. The first of them may 
be readily obtained from Eq. (101a) using the integration by parts: 


C=-2u (ez } xdy + | dx | ydy) =-2u (ca = | ydx)+ | (2 ut = | xdy| 
(7.101b) 
= 4 | 2 aedy — Zpcrier | ity 
A A 


while the proof of one more form, 
C=4ul(V,,,z) day, (7.101c) 
A 


is left for the reader’s exercise. Thus, if we need to know the rod’s rigidity alone, it is sufficient to 
calculate the function 7(x, vy) from Eq. (100) with the boundary condition 7|border = const, and then plug it 
into any of Eqs. (101). Only if we are also curious about the longitudinal deformation (93) of the cross- 
section, we may continue by using Eq. (98) to find the function y(x,y) describing this deformation. 


Let us see how does this recipe work for the two examples discussed above. For the round cross- 
section of radius R, both the Poisson equation (100) and the boundary condition, vy = const at x° + y’ = 
R°, are evidently satisfied by the following axially-symmetric function: 


x=-30" + y’)+const. (7.102) 
For this case, Eq. (101a) yields 


C= 4ul (-+5) +(-39 Jone = ul (x? +y°)d?r, (7.103) 


i.e. the same result (89) that we had for yw = 0. Indeed, plugging Eq. (102) into Eqs. (98), we see that in 
this case 0y/ox = Ow oy = 0, so that (x,y) = const, i.e. the cross-section is not bent. (As was discussed 
in Sec. 1, a uniform translation dq, = ky = const does not constitute a deformation.) 


Now, turning to a rod with a narrow rectangular cross-section A = wxt witht << w, we may use 
this strong inequality to solve the Poisson equation (100) approximately, neglecting the second 
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derivative of v along the wider dimension (say, y). The remaining 1D differential equation d’y/d°x = —1, 
with boundary conditions 7\=+/2 = Vx =, has an obvious solution: y = —x’/2 + const. Plugging this 
expression into any form of Eq. (101), we get the following (correct!) result for the torsional rigidity: 


1 
C= 3 wut’. (7.104) 
Now let us have a look at the cross-section bending law (93) for this particular case. Using Eqs. 
(98), we get 
Mp pity LVoseeg (7.105) 


Integrating these differential equations over the cross-section, and taking the integration constant (again, 
not contributing to the deformation) for zero, we get a beautifully simple result: 


y =xy, Le. q, =Ky. (7.106) 


It means that the longitudinal deformation of the rod has a “propeller bending” form: while the regions 
near the opposite corners (on the same diagonal) of the cross-section bend toward one direction of the z- 
axis, the corners on the other diagonal bend in the opposite direction. (This qualitative conclusion 
remains valid for rectangular cross-sections with any “aspect ratio” ¢/w.) 


For rods with several surfaces, i.e. with cross-sections limited by several boundaries (say, hollow 
pipes), finding the function 7(x, v) requires a bit more care, and Eq. (103b) has to be modified, because 
the function may be equal to a different constant at each boundary. Let me leave the calculation of the 
torsional rigidity for this case for the reader’s exercise. 


7.7. 3D acoustic waves 
Now moving from the statics to dynamics, we may start with Eq. (24), which may be 
transformed into the vector form exactly as this was done for the static case at the beginning of Sec. 4. 
Comparing Eqs. (24) and (52), we immediately see that the result may be represented as 


2 
ca: V-q+ Z V(V-q)+f(00). (7.107) 


a? 2A1+v) 2(1+ v)(1—2v) 


Let us use this general equation for the analysis of the perhaps most important type of time- 
dependent deformations: acoustic waves. First, let us consider the simplest case of a virtually infinite, 
uniform elastic medium, with no external forces: f = 0. In this case, due to the linearity and homogeneity 
of the equation of motion, and taking clues from the analysis of the simple 1D model (see Fig. 6.4a) in 
Secs. 6.3-6.5,32 we may look for a particular time-dependent solution in the form of a sinusoidal, 
linearly-polarized, plane traveling wave 


q(r,t) = Re{ ac _ “J (7.108) 


32 Note though that Eq. (107) is more complex than the simple wave equation (6.40). 
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where a is the constant complex amplitude of a wave (now a vector!), and k is the wave vector, whose 
magnitude is equal to the wave number k. The direction of these two vectors should be clearly 
distinguished: while a determines the wave’s polarization, i.e. the direction of particle displacements, 
the vector k is directed along the spatial gradient of the full phase of the wave 


Y=k-r-—ot+arga, (7.109) 
i.e. along the direction of the wave front propagation. 


The importance of the angle between these two vectors may be readily seen from the following 
simple calculation. Let us point the z-axis of an (inertial) reference frame along the direction of vector k, 
and the x-axis in such direction that the vector q, and hence a, lie within the {x, z} plane. In this case, all 
variables may change only along the z-axis, ie. V = n,(0/oz), and the amplitude vector may be 
represented as the sum of just two Cartesian components: 


a=an,+a.n.,. (7.110) 


Let us first consider a longitudinal wave, with the particle motion along the wave direction: a, = 
0, a = a. Then the vector q in Eq. (107) describing this wave, has only one (z-) component, so that V-q 
= Q/c& and V(V-q) = n.(0’q/6z’), and the Laplace operator gives the same expression: V’q = 
n(0°q/dz’). As a result, Eq. (107), with f = 0, is reduced to a 1D wave equation 


07g, | E zi E 0°q, _ E(\-v) 0°q. 
ar? =| Wl+v) 21+v)\l—-2v)| ez? (1+ v)(1-2v) a2?’ 


0 (7.111) 


similar to Eq. (6.40). As we already know from Sec. 6.4, this equation is indeed satisfied with the 
solution (108), provided that @ and k obey a linear dispersion relation, @ = wk, with the following 
longitudinal wave velocity: 


oe E(l-v) _K+@4@B)u 
(l+v)0—2v)p p 
The last expression allows for a simple interpretation. Let us consider a static experiment, similar 
to the tensile test experiment shown in Fig. 6, but with a sample much wider than / in both directions 
perpendicular to the force. Then the lateral contraction is impossible (s,. = sy, = 0), and we can calculate 
the only finite stress element, o;,, directly from Eq. (34) with Tr (s) = s-z: 


O.,= 24 8. ~fs2]+3K{ s2] = ts F He (7.113) 


(7.112) 


We see that the numerator in Eq. (112) is nothing more than the static elastic modulus for such a 
uniaxial deformation, and it is recalculated into the velocity exactly as the spring constant in the 1D 
waves considered in Secs. 6.3-6.4 — cf. Eq. (6.42). 


Formula (114) becomes especially simple in fluids, where w= 0, and the wave velocity is 
described by the well-known expression 


(7.114) 


Chapter 7 Page 26 of 38 


Longitudinal 
waves: 
velocity 


Longitudinal 
waves: 
velocity 
in fluids 


Transverse 
waves: 
velocity 


Essential Graduate Physics CM: Classical Mechanics 


Note, however, that for gases, with their high compressibility and temperature sensitivity, the value of K 
participating in this formula may differ, at high frequencies, from that given by Eq. (40), because fast 
compressions/extensions of gas are usually adiabatic rather than isothermal. This difference is 
noticeable in Table 1, one of whose columns lists the values of v, for representative materials. 


Now let us consider an opposite case of transverse waves with a, = a, a, = 0. In such a wave, the 
displacement vector is perpendicular to n,, so that V-q = 0, and the second term on the right-hand side of 
Eq. (107) vanishes. On the contrary, the Laplace operator acting on such vector still gives the same non- 
zero contribution, V-q =n{o q/0z’), to Eq. (107) so that the equation yields 


O’q, E 04, 


= ——_—. : LAMS 
Pa Wy) & eo 
and we again get the linear dispersion relation, @= v,k, but with a different velocity: *3 
E 
2 a (7.116) 


vi “Siasin 
(I+v)p p 


We see that the speed of the transverse waves depends exclusively on the shear modulus yw of the 
medium.*4 This is also very natural: in such waves, the particle displacements q = n,q are perpendicular 
to the elastic forces dF = n_dF, so that the only one element o,, of the stress tensor is involved. Also, the 
strain tensor sj has no diagonal elements, Tr (s) = 0, so that wv is the only elastic modulus actively 
participating in Hooke’s law (32). In particular, fluids cannot carry transverse waves at all (formally, 
their velocity (116) vanishes), because they do not resist shear deformations. For all other materials, the 
longitudinal waves are faster than the transverse ones.*° Indeed, for all known natural materials’ 
Poisson’s ratio is positive so that the velocity ratio that follows from Eqs. (112) and (116), 


1/2 
4 -(2=%) (7.117) 
vy, 1-2v 


is above V2 ~ 1.4. For the most popular construction materials, with v= 0.3, Poisson’s ratio is about 2 — 
see Table 1. 


Let me emphasize again that for both the longitudinal and the transverse waves, the dispersion 
relation between the wave number and frequency is linear: @= vk. As was already discussed in Chapter 
6, in this case of acoustic waves (or just “sound’’), the phase and group velocities are equal, and waves 
of more complex form, consisting of several (or many) Fourier components of the type (108), preserve 


33 Just as in Chapter 6, let me emphasize that the wave velocities we are discussing in this section and Sec. 8 
below have nothing to do with particle velocities Oq/Ot. For example, in the transverse wave we are discussing 
now, ¥; is the velocity in the z-direction, while the particles of the medium move across it, in the x-direction. Also, 
y, and vy; do not depend on the wave amplitudes, while the particle velocities are proportional to them. 

34 Because of that, one can frequently meet the term shear waves. Note also that in contrast to the transverse 
waves in the simple 1D model analyzed in Chapter 6 (see Fig. 6.4a), those in a 3D continuum do not need a pre- 
stretch tension % We will return to the effect of tension in the next section. 

35 Because of this difference between v, and v,, in geophysics, the longitudinal waves are known as P-waves (with 
the letter P standing for “primary’’) because they arrive at the detection site, say from an earthquake, first — before 
the transverse waves, called the S-waves, with S standing for “secondary”. 
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their form during propagation. This means that both Eqs. (111) and (115) are satisfied by solutions of 
the type (6.41): 


ate) = fF), (7.118) 


where the functions f; describe the propagating waveforms. (However, if the initial wave is a mixture, 
of the type (110), of the longitudinal and transverse components, then these components, propagating 
with different velocities, will “run from each other”.) As one may infer from the analysis of a periodic 
system model in Chapter 6, the wave dispersion becomes essential at very high (hypersound) 
frequencies where the wave number k becomes close to the reciprocal distance d between the particles 
of the medium (e.g., atoms or molecules), and hence the approximation of the medium as a continuum, 
used through this chapter, becomes invalid. 


As we already know from Chapter 6, besides the velocity, the waves of each type are 
characterized by one more important parameter, the wave impedance Z — for acoustic waves frequently 
called the acoustic impedance of the medium. Generalizing Eq. (6.46) to the 3D case, we may define the 
impedance as the ratio of the force per unit area (i.e. the corresponding element of the stress tensor) 
exerted by the wave, and the particles’ velocity. For the longitudinal waves, 


i = O.. = O.. S.. =_ O.. Og. /Oz : (7.119) 
Oq,/Ot| |s,, Oq,/0t| |s,, Oq,/Ot 
Plugging in Eqs. (108), (112), and (113), we get 
Z, =|(K +4u/3)p|'”, (7.120) 


in a clear analogy with the first of Eqs. (6.48). Similarly, for the transverse waves, the appropriately 
modified definition, Z; = |0;-/(0q,/0z)|, yields 


Z, =(up)'”. (7.121) 


Just like in the 1D models studied in Chapter 6, one role of the wave impedance is to scale the 
power Y carried by the wave. For plane 3D waves in infinite media, with their infinite wave front area, 
it is more appropriate to speak about the power density, i.e. power ” = d/A/dA per unit area of the front, 
and characterize it by not only its magnitude, 

dF oq 
| rr ae ae 
dA ot 
but also the direction of the energy propagation, that (for a plane acoustic wave in an isotropic medium) 
coincides with the direction of the wave vector: # = /nx. Using the definition (18) of the stress tensor, 
the Cartesian components of this Umov vector3® may be expressed as 


(7.122) 


36 Named after N. A. Umov, who introduced this concept in 1874 — ten years before a similar notion for 
electromagnetic waves (see, e.g., EM Sec. 6.4) was suggested by J. Poynting. In a dissipation-free elastic medium, 
the Umov vector obeys the continuity equation d(pv’/2 + u)/dt 4 V-~ = 0, with u given by Eq. (52), which 
expresses the conservation of the total (kinetic plus potential) energy of the elastic deformation. 
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ye. (7.123) 
7 ot 
Returning to plane waves propagating along axis z, and acting exactly like in Sec. 6., for both the 
longitudinal and transverse waves we again arrive at Eq. (6.49), but for / rather than / (due to a 
different definition of the wave impedance — per unit area rather than per particle chain). For the 
sinusoidal waves of the type (108), it yields 


fe = aa , (7.124) 


with Z being the corresponding impedance — either Z, or Z. 


Just as in the 1D case, one more important effect, in which the notion of impedance is crucial, is 
the partial wave reflection from at an interface between two media. The two boundary conditions, 
necessary for the analysis of the reflection, may be obtained from the continuity of the vectors q and dF. 
(The former condition is evident, while the latter one may be obtained by applying the 2™ Newton law 
to any infinitesimal volume dV = dAdz, where the segment dz straddles the interface.) Let us start from 
the simplest case of the normal incidence on a plane interface between two uniform media, each with its 
own elastic moduli and mass density. Due to the symmetry of the system, it is obvious that the 
longitudinal/transverse incident wave may only excite similarly polarized reflected and transferred 
waves. As a result, we may literally repeat the calculations of Sec. 6.4, again arriving at the fundamental 
relations (6.55) and (6.56), with the replacement of Z and Z’ with the corresponding values of either Z; 
(120) or Z (121). Thus, at the normal incidence, the wave reflection is determined solely by the 
acoustic impedances of the media, while the sound velocities are not involved. 


The situation, however, becomes more complicated at a nonzero incidence angle 9" (Fig. 12), 
where the transmitted wave is generally also refracted, i.e. propagates under a different angle, 0’# g”, 
beyond the interface. Moreover, at 0 ¥ 0 the directions of particle motion (vector q) and of the stress 
forces (vector dF) in the incident wave are neither exactly parallel nor exactly perpendicular to the 
interface, and thus this wave may serve as an actuator for the reflected and refracted waves of both 
polarizations — see Fig. 12, drawn for the particular case when the incident wave is transverse. The 
corresponding four angles, A”, @, 0’, 0’, may be readily related to @" by the “kinematic” condition 
that the incident wave, as well as the reflected and refracted waves of both types, must have the same 
spatial distribution along the interface plane, i.e. for the interface particles participating in all five 
waves. According to Eq. (108), the necessary boundary condition is the equality of the tangential 
components (in Fig. 12, kx), of all five wave vectors: 


k, sinO"”) =k, sin6”) =k, sin =k sin! =k, =k, sing”. (7.125) 


Since the acoustic wave vector magnitudes k, at fixed frequency @, are inversely proportional to the 
corresponding wave velocities, we immediately get the following relations: 

sind") sing sind, sing") 
eee ee Se ee (7.126) 


vy vy V, Vv, 


so that generally all four angles are different. (This is of course an analog of the well-known Snell law in 
optics — where, however, only transverse waves are possible.) These relations show that, just like in 
optics, the direction of a wave propagating into a medium with lower velocity is closer to the normal (in 
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Fig. 12, to the z-axis). In particular, this means that if v’ > v, the acoustic waves, at larger angles of 
incidence, may exhibit the effect of total internal reflection, so well known from optics?’, when the 
refracted wave vanishes. In addition, Eqs. (126) show that in acoustics, the reflected longitudinal wave, 
with velocity v, > v,, may vanish at sufficiently large angles of the transverse wave incidence. 


Fig. 7.12. Deriving the “kinematic” 
conditions (126) of the acoustic wave 
reflection and refraction (for the case of 
a transverse incident wave). 


All these facts automatically follow from general expressions for amplitudes of the reflected and 
refracted waves via the amplitude of the incident wave. These relations are straightforward to derive 
(again, from the continuity of the vectors q and dF), but since they are much bulkier than those in the 
electromagnetic wave theory (where they are called the Fresnel formulas3’), I would not have 
time/space to spell out and discuss them. Let me only note that, in contrast to the case of normal 
incidence, these relations involve eight media parameters: the impedances Z, Z’, and the velocities v, v’ 
on both sides of the interface, and for both the longitudinal and transverse waves. 


There are other interface effects as well. Within certain frequency ranges, interfaces and surfaces 
of elastic solids may sustain so-called surface acoustic waves (SAW), in particular, the Rayleigh waves 
and the Love waves.*? The main feature that distinguishes such waves from their bu/k (longitudinal and 
transverse) counterparts discussed above, is that the displacement amplitudes are largest at the interface 
and decay exponentially into the bulk of both adjacent media, so that the waves cannot be plane in the 
usual sense of being independent of two Cartesian coordinates. 


For an analysis of such waves, it is important that in a uniform medium, even non-plane elastic 
waves may be always separated into independent longitudinal and transverse components. Indeed, it is 
straightforward (and hence left for the reader) to prove that Eq. (107) may be satisfied by a vector sum 
q(r, 0) = qi(r, 2) + qi(r, £), with the former component having zero curl (Vxq = 0) and propagating with 
the velocity (112), and the latter component having zero divergence (V-q; = 0) and propagating with the 
velocity (116). The plane waves gin, and qn, analyzed above certainly fall into these two categories, but 
in more general waves, there may be no clear association between the longitudinal and transverse 
components and their polarization. 


This is true, in particular, in the Rayleigh waves, where the particle displacement vector q may 
be represented as the sum qi + q:, each of the vectors having more than one Cartesian component. In 


37 See, e.g., EM Sec. 7.5. 

38 Their discussion may be also found in EM Sec. 7.5. 

39 Named, respectively, after Lord Rayleigh (born J. Strutt, 1842-1919) who has theoretically predicted the very 
existence of surface acoustic waves, and A. Love (1863-1940). 
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contrast to the bulk waves, the longitudinal and transverse components are coupled via their interaction 
with the interface, and as a result, propagate with a single velocity vp. A straightforward analysis of the 
Rayleigh waves on the surface of an elastic solid (i.e. its interface with free space) yields the following 


equation for vp: 
we) Ve Va 
1-—%_} =/1-4 }1-—+ |}. AA2S 
(ecard ey aa 


According to this formula, and Eqs. (112) and (116), for realistic materials with the Poisson index 
between 0 and 2, the Rayleigh waves are slightly (by 4 to 13%) slower than the bulk transverse waves — 
and hence substantially slower than the bulk longitudinal waves. 


In the simplest case a “1D-plane” Rayleigh wave, independent of one Cartesian coordinate, the 
net vector q has just two Cartesian components (each contributed by qi and q:): one parallel to the 
propagation direction and hence to the interface, and another one normal to it. As a result, the trajectory 
of each particle in the wave is an ellipse in the plane normal to the interface. In contrast, the Love waves 
are purely transverse, with q oriented parallel to the interface. However, the interaction of these waves 
with the interface reduces their velocity v_ in comparison with that (1) of the bulk transverse waves, 
keeping it within the narrow interval between y; and vr: 


Ves Vio hy SY. (7.128) 


The practical importance of surface acoustic waves is that their amplitude decays very slowly 
with distance r from their point-like source: a « 1/r'”, while any bulk waves decay much faster, as a 
1/r. (Indeed, in the latter case the power Y x a’, emitted by such source, is distributed over a spherical 
surface area proportional to r’, while in the former case all the power goes into a thin surface circle 
whose length scales as r.) At least two areas of applications of the surface acoustic waves have to be 
mentioned: geophysics (for the earthquake detection and the Earth crust seismology), and electronics 
(for signal processing, with a focus on frequency filtering). Unfortunately, I cannot dwell on these 
interesting topics and I have to refer the reader to special literature.*° 


7.8. Elastic waves in restricted geometries 


From what was discussed at the end of the last section, it should be pretty clear that generally, 
the propagation of acoustic waves in elastic bodies of finite size is rather complicated. There is, 
however, one important limit in which several important simple results may be readily obtained. This is 
the limit of (relatively) low frequencies, where the corresponding wavelength is much larger than at 
least one dimension of the system. 


Let us consider, for example, various waves that may propagate along thin rods, in this case 

“thin” meaning that the characteristic size a of the rod’s cross-section is much smaller than not only the 

length of the rod but also the wavelength 2 = 277k. In this case, there is a considerable range Az of 
distances along the rod, 

a<<Az<</, (7.129) 


40 See, for example, K. Aki and P. Richards, Quantitative Seismology, 2™ ed., University Science Books, 2002 
and D. Morgan, Surface Acoustic Waves, 2" ed., Academic Press, 2007. 
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in that we can neglect the material’s inertia, and apply the results of our earlier static analyses. For 
example, for a longitudinal wave of stress, which is essentially a wave of periodic tensile extensions 
and compressions of the rod, within the range (129) we may use the static relation (42): 


o,, = Es,. (7.130) 


In this simple case, it is easier to use the general equation of elastic dynamics not in its vector form 
(107), but rather in the precursor, Cartesian-component form (25), with f;= 0. For the plane waves of 
stress, propagating along the z-axis, only one component (with j’ — z) of the sum on the right-hand side 
of that equation is not equal to zero, and it is reduced to 

A’q, Oo, 

—+=—. TABI 

Per a os 

In our current case of longitudinal waves, all components of the stress tensor but o;, are equal to zero. 
With o;, from Eq. (130), and using the definition s,, = 0q,/0z, Eq. (131) is reduced to a simple 1D wave 
equation, 


O’q, _ ,0°q, 
Par (7,132) 


(7.133) 


Comparing this result with Eq. (112), we see that the tensile wave velocity, for any realistic 
material with a positive Poisson’s ratio, is lower than the velocity vy, of longitudinal waves in the bulk of 
the same material. The reason for this difference is simple: in thin rods, the cross-section is free to 
oscillate (e.g., shrink in the longitudinal extension phase of the passing wave),*! so that the effective 
force resisting the longitudinal deformation is smaller than in a border-free space. Since (as it is clearly 
visible from the wave equation), the scale of the force determines that of v’, this difference translates 
into slower waves in rods. Of course, as the wave frequency is increased to ka ~ 1, there is a (rather 
complicated and cross-section-depending) crossover from Eq. (133) to Eq. (112). 


Proceeding to transverse waves on rods, let us first have a look at long bending waves for which 
the condition (129) is satisfied, so that the vector q = n,q, (with the x-axis being the bending direction — 
see Fig. 8) is virtually constant in the whole cross-section. In this case, the only element of the stress 
tensor contributing to the net transverse force F, is 0;:, so that the integral of Eq. (131) over the cross- 
section 1s ; 

2 4s a i F, =(o,4’r. (7.134) 
Ot Oz ; 


Now, if Eq. (129) is satisfied, we again may use the static local relations (75)-(77), with all derivatives 
d/dz duly replaced with their partial form 0/0z, to express the force F\, via the bending deformation q,. 
Plugging these relations into each other one by one, we arrive at a rather unusual differential equation 


41 For this reason, the tensile waves can be called longitudinal only in a limited sense: while the stress wave is 
purely longitudinal: o;. = o,, = 0, the strain wave is not: s,.= Ss), = —os., # 0, 1.e. q(r, t) # Ng. 
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0*q 
=—-KEI, =, 
ot? * @z' 
Looking for its solution in the form of a sinusoidal wave (108), we get the following dispersion 
relation:*2 


(7.135) 


Bending 
waves: 
dispersion 
relation 


(7.136) 


Such relation means that the bending waves are not acoustic at any frequency, and cannot be 
characterized by a single velocity that would be valid for all wave numbers 4, i.e. for all spatial Fourier 
components of a waveform. According to our discussion in Sec. 6.3, such strongly dispersive systems 
cannot pass non-sinusoidal waveforms too far without changing their waveform rather considerably. 


This situation changes, however, if the rod is pre-stretched with a tension force Y— just as in the 
discrete 1D model that was analyzed in Sec. 6.3. The calculation of the effect of this force is essentially 
similar; let us repeat it for the continuous case, for a minute neglecting the bending stress — see Fig. 13. 


S(z+dz) 


Fig. 7.13. Additional forces in a thin rod 
(“string”), due to the background tension /. 


Zz zt+dz 


Still sticking to the limit of small angles g, the additional vertical component d/, of the net force 
acting on a small rod fragment of length dz is Y,(z — dz) — F,(z) = F oz + dz) — SQfz) = F (0@,/Oz)dz, 
so that OF;,/0z = ¥ (0@,/Oz). With the geometric relation (77) in its partial-derivative form 0q,/0z = @,, 
this additional term becomes ¥ (0°q,/6z"). Now adding it to the right-hand side of Eq. (135), we get the 
following dispersion relation 

wo? = —_(EI,k* + 5k?). (7.137) 
fA 
Since the product p4 in the denominator of this expression is just the rod’s mass per unit length (which 


was denoted yz in Chapter 6), at low & (and hence low frequencies), this expression is reduced to the 
linear dispersion law, with the velocity given by Eq. (6.43): 


»-(4) (7.138) 


fA 


So Eq. (137) describes a smooth crossover from the “guitar-string” acoustic waves to the highly 
dispersive bending waves (136). 


42 Note that since the “moment of inertia” /,, defined by Eq. (70), may depend on the bending direction (unless the 
cross-section is sufficiently symmetric), the dispersion relation (136) may give different results for different 
directions of the bending wave polarization. 
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Now let us consider another type of transverse waves in thin rods — the so-called torsional 
waves, which are essentially the dynamic propagation of the torsional deformation discussed in Sec. 6. 
The easiest way to describe these waves, again within the limits (129), is to write the equation of 
rotation of a small segment dz of the rod about the z-axis, passing through the “center of mass” of its 
cross-section, under the difference of torques t = n,z, applied on its ends — see Fig. 10: 

2 
pl dz - Pr g 


2 Zz? 


(7.139) 


where J, is the “moment of inertia” defined by Eq. (91), which now, after its multiplication by dz, Le. 
by the mass per unit area, has turned into the genuine moment of inertia of a dz-thick slice of the rod. 
Dividing both sides of Eq. (139) by dz, and using the static local relation (84), 4 = Cx = C(0q@_/0z), we 
get the following differential equation 

ap ag, 

i =C—. 7.140 
oa ee (7.140) 
Just as Eqs. (111), (115), and (132), this equation describes an acoustic (dispersion-free) wave, which 
propagates with the following frequency-independent velocity 


Z 


(7.141) 


As we have seen in Sec. 6, for rods with axially-symmetric cross-sections, the torsional rigidity 
C is described by the simple relation (89), C = wl, so that Eq. (141) is reduced to Eq. (116) for the 
transverse waves in infinite media. The reason for this similarity is straightforward: in a torsional wave, 
particles oscillate along small arcs (Fig. 14a), so that if the rod’s cross-section is round, its surface is 
stress-free, and does not perturb or modify the motion in any way, and hence does not affect the 
transverse velocity. 


(a) (b) 


C) Fig. 7.14. Particle trajectories in two 


different transverse waves with the same 
C) velocity: (a) torsional waves in a thin 
round rod and (b) circularly-polarized 
C) waves in an infinite (or very broad) 
sample. 


This fact raises an interesting issue of the relation between the torsional and circularly-polarized 
waves. Indeed, in Sec. 7, I have not emphasized enough that Eq. (116) is valid for a transverse wave 
polarized in any direction perpendicular to the wave vector k (in our notation, directed along the z-axis). 
In particular, this means that such waves are doubly-degenerate: any isotropic elastic continuum can 
carry simultaneously two non-interacting transverse waves propagating in the same direction with the 
same velocity (116), with two mutually perpendicular linear polarizations (directions of the vector a), 
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for example, directed along the x- and y-axes.*3 If both waves are sinusoidal (108), with the same 
frequency, each point of the medium participates in two simultaneous sinusoidal motions within the [x, 
y] plane: 


re Re ae" - = =A,cos¥, 4,= Re a,c" - = = A, cos(¥ +9), (7.142) 


where Y = kz — at + , and p= Y, — g. Basic geometry tells us that the trajectory of such a motion on 
the [x, y] plane is an ellipse (Fig. 15), so that such waves are called elliptically polarized. The most 
important particular cases of such polarization are: 


(i) 9 =O or @ a linearly-polarized wave, with the displacement vector a is directed at angle 0 = 
tan'(A,/A,) to the x-axis; and 


(ii) g@ = + a2 and A, = A,: two possible circularly-polarized waves, with the right or left 
polarization, respectively.*4 


Fig. 7.15. The trajectory of a particle in an 
‘i elliptically polarized transverse wave, within 
A the plane perpendicular to the direction of 


wave propagation. 


Now comparing the trajectories of particles in the torsional wave in a thin round rod (or pipe) 
and the circularly-polarized wave in a broad sample (Fig. 14), we see that, despite the same wave 
propagation velocity, these transverse waves are rather different. In the former case (Fig. 14a) each 
particle moves back and forth along an arc, with the arc’s length different for different particles (and 
vanishing at the rod’s center), so that the waves are not plane. On the other hand, in a circularly 
polarized wave, all particles move along similar, circular trajectories, so that such a wave is plane. 


To conclude this chapter, let me briefly mention the opposite limit when the size of the body, 
from whose boundary the waves are completely reflected,*> is much larger than the wavelength. In this 
case, the waves propagate almost as in an infinite 3D continuum (which was analyzed in Sec. 7), and the 
most important new effect is the finite number of wave modes in the body. Repeating the 1D analysis at 
the end of Sec. 6.5, for each dimension of a 3D cuboid of volume V = /;/)/3, and taking into account that 
the numbers k,, in each of the three dimensions are independent, we get the following generalization of 


43 As was discussed in Sec. 6.3, this is also true in the simple 1D model shown in Fig. 6.4a. 

44 The circularly polarized waves play an important role in quantum mechanics, where they may be most naturally 
quantized, with their elementary excitations (in the case of mechanical waves we are discussing, called phonons) 
having either positive or negative angular momentum L, = +h. 

45 For acoustic waves, such a condition is easy to implement. Indeed, from Sec. 7 we already know that the strong 
inequality of the wave impedances Z is sufficient for such reflection. The numbers in Table 1 show that, for 
example, the impedance of a longitudinal wave in a typical metal (say, steel) is almost two orders of magnitude 
higher than that in air, ensuring their virtually full reflection from the surface. 
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Eq. (6.75) for the number AN of different traveling waves with wave vectors within a relatively small 
volume ak of the wave vector space: 


irae" 4% >>1, for 7< d*k<<k’, (7.143) 


(27) 
where k >>> 1/23 1s the center of this volume, and g is the number of different possible wave modes 
with the same wave vector k. For the mechanical waves analyzed above, with one longitudinal mode, 
and two transverse modes with different polarizations, g = 3. 


Note that since the derivation of Eqs. (6.75) and (143) does not use other properties of the waves 
(in particular, their dispersion relations), this mode counting rule is ubiquitous in physics, being valid, in 
particular, for electromagnetic waves (where g = 2) and quantum “de Broglie waves” (i.e. 
wavefunctions), whose degeneracy factor g is usually determined by the particle’s spin.*6 


7.9. Exercise problems 


7.1. Derive Eqs. (16). 


Hint: Besides the definition of the cylindrical coordinates and basic calculus, you may like to use 
Eq. (4.7) with dp = (dg)n-. 


7.2. A uniform thin sheet of an isotropic, elastic material, of 
thickness ¢ and area A >> f°, is compressed by two plane, parallel, broad, A) 
rigid surfaces — see the figure on the right. Assuming that there is no t 
slippage between the sheet and the surfaces, calculate the relative t 
compression (-Ad¢/t) as a function of the compressing force. Compare the A 
result with that for the tensile stress calculated in Sec. 3. 


7.3. Two opposite edges of a thin, very wide sheet of an isotropic, elastic material have been 
clamped in two rigid, plane, parallel walls that are pulled apart with force F, along the sheet’s length /. 
Find the relative extension A/// of the sheet in the direction of the force and its relative compression A¢/t 
in the perpendicular direction, and compare the results with Eqs. (45)-(46) for the tensile stress and the 
solution of the previous problem. 


t<<R 
7.4. Calculate radial extension AR of a thin, long, round cylindrical 


pipe, due to its rotation with a constant angular velocity @ about its 
symmetry axis (see the figure on the right), in terms of the elastic moduli F 
and v. The external pressure both inside and outside the pipe is negligible. 


75. A static force F is exerted on an inner point of a uniform, 
isotropic elastic body. Calculate the spatial distribution of the deformation created by the force, 
assuming that far from the point of its application and the points we are interested in, the body’s position 
is kept fixed. 


46 See, e.g., EM Secs. 7.8 and QM Sec. 1.7. 
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7.6. A long, uniform rail with the cross-section shown in the figure 
on the right, is being bent with the same (small) torque twice: first within 
the xz-plane and then within the yz-plane. Assuming that t << /, find the / 
ratio of the rail bending deformations in these two cases. 


7.7. Two thin rods of the same length and mass are 
made of the same elastic, isotropic material. The cross-section 
of one of them is a circle, while the other one is an equilateral 
triangle — see the figure on the right. Which of the rods is 
stiffer for bending along its length? Quantify the relation. Does 
the result depend on the bending plane orientation? 


7.8. A thin, elastic, uniform, initially straight beam is placed on 
two point supports at the same height — see the figure on the right. . 
Calculate the support placements that: 


(1) ensure that the beam ends are horizontal, and 
(11) minimize the largest deviation of the beam from the 
horizontal baseline. 


Hint: For Task (11), an approximate answer (with an accuracy better than 1%) is acceptable. 


7.9. Calculate the largest longitudinal compression force 7 
that may be withstood by a thin, straight, elastic rod without 
bucking (see the figure on the right) for two shown cases: 

(1) the rod’s ends are clamped, and 

(11) the rod is free to turn about the support points. 


7.10. An elastic, light, thin poll with a square cross-section of area A = axa, had been firmly dug 
into the ground in the vertical position, sticking out by height h >> a. What largest compact mass M may 
be placed straight on the top of the poll without the stability loss? 


7.11. Calculate the potential energy of a small and slowly changing, but otherwise arbitrary 
bending deformation of a uniform, elastic, initially straight rod. Can the result be used to derive the 


dispersion relation (136)? 


7.12. Calculate the torsional rigidity of a thin, uniform rod whose cross-section is an ellipse with 
semi-axes a and b. 


7.13. Calculate the potential energy of a small but otherwise arbitrary torsional deformation @{z) 
of a uniform, straight, elastic rod. 
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7.14. Calculate the spring constant « = dF/dl of a 
coil made of a uniform, elastic wire with a circular cross- 
section of diameter d, wound as a dense round spiral of N 
>> 1 turns of radius R >> d— see the figure on the right. 


7.15. The coil described in the previous problem is now used as what is 
sometimes called the torsion spring — see the figure on the right. Find the 
corresponding spring constant dz/dg, where Tis the torque of the external forces F 


relative to the center of the coil (point 0). 


7.16. Use Eqs. (99) and (100) to recast Eq. (101b) for the torsional rigidity C of a thin rod into 
the form given by Eq. (10Ic). 


17." Generalize Eq. (101b) to the case of rods with more than one cross-section’s boundary. 
Use the result to calculate the torsional rigidity of a thin round pipe, and compare it with Eq. (91). 


— 


7.18. Prove that in a uniform medium, any (not necessarily plane) elastic wave may be 
decomposed into a longitudinal wave with Vxq) = 0 and a transverse wave with V-q; = 0, and find the 
equations satisfied by these functions. 


7.19." Use the wave equations derived in the solution of the previous problem and the semi- 
quantitative description of the Rayleigh surface waves given in Sec. 7 of the lecture notes, to calculate 
the structure of the waves and derive Eq. (127). 


7.20. Calculate the modes and frequencies of free radial oscillations of a sphere with radius R, 
made of a uniform elastic material. 


7.21. A long steel wire has a circular cross-section with a 3-mm diameter, and is pre-stretched 
with a constant force of 10 N. Which of the longitudinal and transverse waves with frequency 1 kHz has 
the largest group velocity in the wire? Accept the following parameters for the steel (see Table 1): E = 
170 GPa, v= 0.30, 9=7.8 g/cm’. 


7.22. Define and calculate the wave impedances for (i) tensile and (11) P. 


torsional waves in a thin rod, appropriate in the long-wave limit. Use the 

results to calculate the fraction of each wave’s power / reflected from a firm — 
connection of a long rod with a round cross-section to a similar rod, but with a 
twice smaller diameter — see the figure on the right. P 


7.23. Calculate the fundamental frequency of small transverse standing waves on a free and 
uniform thin rod, and the position of displacement nodes in this mode. 


Hint: Numerical solution of the final transcendent equations is acceptable. 
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Chapter 8. Fluid Mechanics 


This chapter describes the basic notions of fluid mechanics, discusses a few core problems of statics and 
dynamics of ideal and viscous fluids, and gives a very brief review of such a complicated phenomenon 
as turbulence. In addition, the viscous fluid flow discussion is used as a platform for an elementary 
introduction to numerical methods of the partial differential equation solution — whose importance 
extends well beyond this particular field. 


8.1. Hydrostatics 


The mechanics of fluids (defined as the materials that cannot keep their geometric form on their 
own, and include both liquids and gases) is both more simple and more complex than that of the elastic 
solids, with the simplifications mostly in statics.' Indeed, fluids, by definition, cannot resist static shear 
deformations. There are two ways to express this fact. First, we can formally take the shear modulus yw, 
describing this resistance, to equal zero. Then the Hooke’s law (7.32) shows that the stress tensor is 
diagonal: 


O ip =F 0 jp (8.1) 
Alternatively, the same conclusion may be reached just by looking at the stress tensor definition (7.19) 
and/or Fig. 7.3, and saying that in the absence of shear stress, the elementary interface dF has to be 
normal to the area element dA, i.e. parallel to the vector dA. 


Moreover, in fluids at equilibrium, all three diagonal elements oj of the stress tensor have to be 
equal at each point. To prove that, it is sufficient to single out (mentally rather than physically), from a 
static fluid, a small volume in the shape of a right prism, with mutually perpendicular faces normal to 
the two directions we are interested in — in Fig. 1, along the x- and y-axes. 


Og AA COS o,,dA, =o ,,(dAcosa) 
Bs 
aie Fig. 8.1. Proving the pressure isotropy. 


The prism is in equilibrium if each Cartesian component of the vector of the total force exerted 
on all its faces equals zero. For the x-component this balance may be expressed as 0;,dA, — (OagdA)cosa 
= (0. However, from the geometry (Fig. 1), dA, = dAcosa, so that the above expression yields Oga = Oxx. 
A similar argument for the y-component gives Ogq = So that o,. = oO. Changing the orientation of 
the prism, we can get such equalities for any pair of diagonal elements of the stress tensor, oj, so that all 
three of them have to be equal. 


'Tt is often called hydrostatics because water has always been the most important liquid for the human race and 
hence for science and engineering. 
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This common diagonal element of the stress matrix is usually denoted as (-7), because in the 
vast majority of cases, the parameter 7, called pressure, is positive. Thus we arrive at the key relation 
(which was already mentioned in Sec. 7.2): 


(8.2) 


In the absence of bulk forces, pressure should be constant through the volume of fluid, due to the 
translational symmetry. Let us see how this result is affected by bulk forces. With the simple stress 
tensor (2), the general condition of equilibrium of a continuous medium, expressed by Eq. (7.25) with 
the left-hand side equal to zero, becomes 


P 
Bae nr (8.3) 
Or, 


and may be re-written in the following convenient vector form: 
-VP +f =0. (8.4) 


In the simplest case of a heavy fluid with mass density p, in a uniform gravity field f = pg, the equation 
of equilibrium becomes, 
—-VP + pg=0, (8.5) 


with only one nonzero component — near the Earth’s surface, the vertical one. If, in addition, the fluid 
may be considered incompressible, with its density p constant,? this equation may be readily integrated 
over the vertical coordinate (say, y) to give the so-called Pascal equation:3 


where the direction of the y-axis is taken opposite to that of vector g. 


Two manifestations of this key equation are well known. The first one is the fact that in 
interconnected vessels filled with a fluid, its pressure is equal at all points at the same height (y), 
regardless of the vessel shape, provided that the fluid is in equilibrium.‘ In particular, if a heavy liquid 
has an open surface, then in equilibrium, it has to be horizontal — at least, not too close to the retaining 
walls (see Sec. 2). 


The second manifestation of Eq. (6) is the buoyant force F, exerted by a liquid on a (possibly, 
partly) submerged body, i.e. the vector sum of the elementary pressure forces dF = 7@A exerted on all 
elementary areas dA of the submerged part of the body’s surface — see Fig. 2. According to Eq. (6), with 
the constant equal to zero (corresponding to zero pressure at the liquid’s surface taken for y = 0, see Fig. 
2a), the vertical component of this elementary force is 


2 As was discussed in Sec. 7.3 in the context of Table 7.1, this is an excellent approximation, for example, for 
human-scale experiments with water. 

3 The equation, and the SI unit of pressure 1 Pa = 1N/m’, are named after Blaise Pascal (1623-1662) who has not 
only pioneered hydrostatics, but also invented the first mechanical calculator, and made several other important 
contributions to mathematics — and to Christian philosophy! 

4 This simple fact opens wide opportunities for the engineering field of hydraulics, in particular enabling a very 
simple and efficient way to magnify forces, using interconnected hydraulic cylinders of different diameters. 
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dF’, = dF cosy = FaA cos y = —pgy cos gd = —pgydA,,. (8.7) 


where dA;, = cos@dA is the horizontal footprint (say, dxdz) of the elementary area dA. Now integrating 
this relation over all the surface, we get the total vertical buoyant force:> 


F, = pg|(-y)dd, = pel, (8.8) 


where V is the volume of the swhmerged part of the body’s volume, while p is the liquid’s density, so 
that by magnitude, F, equals the weight of the liquid which would fill the submerged volume. 


(a) (b) 


after 


before 
2 | 


This well-known Archimedes principle may be proved even more simply using the following 
argument: the liquid’s pressure forces, and hence the resulting buoyant force, cannot depend on what is 
inside the body’s volume. Hence Fi, would be the same if we filled the volume V in question with a 
liquid similar to the surrounding one. But in this case, the liquid should be still in equilibrium even if the 
surface is completely flexible, so that both forces acting on its inner part, the buoyant force F, and the 
inner liquid’s weight mg = pVg, have to be equal and opposite, thus proving Eq. (8) again. 


a 
b Fig. 8.2. Calculating 
i the buoyant force. 


Despite the simplicity of the Archimedes principle, its erroneous formulations, such as “Zhe 
buoyant force’s magnitude is equal to the weight of the displaced liquid” |WRONG!] creep from one 
undergraduate textbook to another, leading to application errors. A typical example is shown in Fig. 2b, 
where a solid vertical cylinder with the base area A is pressed into a liquid inside a container of 
comparable size, pushing the liquid’s level up by distance a. The correct answer for the buoyant force, 
following from Eq. (8), is 

F, = pgV = pgd(a +b), (8.9a) 


because the volume V of the submerged part of the cylinder is evidently A(a + b). But the wrong 
formulation cited above, using the term displaced liquid, would give a different answer: 


F, = PREV sisptaced = pgAb. [WRONG!] (8.9b) 


(The latter result is correct only asymptotically, in the limit b/a +00.) 


Another frequent error in hydrostatics concerns the angular stability of a freely floating body — 
the problem of vital importance for the boat/ship design. It is sometimes claimed that the body is stable 


5 The force is vertical, because the horizontal components of the elementary forces dF exerted on opposite 
elementary areas dA, at the same height y, cancel. 
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only if the so-called buoyancy center, the effective point of buoyant force application (in Fig, 3, point 
B),° is above the center of mass (C) of the floating body. However, as Fig. 3 shows, this is unnecessary; 
indeed in the shown case, point B remains below point C, even at a small tilt. Still, in this case, the 
torque created by the pair of forces F,, and mg tries to return the body to the equilibrium position, which 
is therefore stable. As Fig. 3 shows, the actual condition of the angular stability may be expressed as the 
requirement for point M (in shipbuilding, called the metacenter of the ship’s hull) to be above the ship’s 
center of mass C.7 


Fig. 8.3. Angular stability of a 
floating body. 


To conclude this section, let me note that the integration of Eq. (4) may be more complex in the 
case if the bulk forces f depend on position,’ and/or if the fluid is substantially compressible. In the 
latter case, Eq. (4) has to be solved together with the medium-specific equation of state p = AP) 
describing its compressibility law — whose example is given by Eq. (7.38) for ideal gases: 9 = mN/V = 
mAksT, where m is the mass of one gas molecule. 


8.2. Surface tension effects 


Besides the bulk (volume-distributed) forces, one more possible source of pressure is surface 
tension. This effect results from the difference between the potential energy of atomic interactions on 
the interface between two different fluids and that in their bulks, and thus may be described by an 


additional potential energy 


where A is the interface area, and y is called the surface tension constant — or just the “surface tension”. 
For a stable interface of any two fluids, y is always positive.? For surfaces of typical liquids (or their 
interfaces with air), at room temperature, the surface tension equals a few 10° J/m’,!° corresponding to 


6 A simple calculation, similar to the one resulting in Eq. (8), but for the total torque rather than the total force, 
shows that B is just the center of mass of the submerged volume V filled with any uniform material. 

7 It is easy (and hence is left for the reader) to prove that a small tilt of the body leads to a small lateral 
displacement of point B, but does not affect the position of the metacenter M. 

8 A simple example of such a problem is given by the fluid equilibrium in a container rotating with a constant 
angular velocity w. If we solve such a problem in a reference frame rotating together with the container, the real 
bulk forces should be complemented by the centrifugal “force” (4.93), depending on r. 

9 Indeed, if the y of the interface of certain two fluids is negative, it self-reconfigures to decrease Ui, i.e. to 
increase | U; |, by increasing the interface area, i.e. fragments the system into a macroscopically-uniform solution. 
10 For a better feeling of this number, one should remember that | J/m? = 1 N/m. 
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the potential energy U; of a few 107 eV per surface molecule — i.e. just a fraction of the full binding (or 
“cohesive”) energy of the same liquid, which is typically of the order of 107 eV per molecule. 


In the absence of other forces, the surface tension makes a liquid drop spherical to minimize its 
surface area A at a fixed volume. For the analysis of the surface tension effects for more complex 
geometries, and in the presence of other forces, it is convenient to reduce them to a certain additional 
effective pressure drop A. at the interface. To calculate APs, let us consider the condition of 
equilibrium of a small part dA of a smooth interface between two fluids (Fig. 2), in the absence of bulk 
forces. 


P\___-¥, ®, 
\dA+6(dA) 
Ris dA ,: 


a or Fig. 8.4. Deriving the Young-Laplace 
formula (13). 
If the pressures 7,2 on the two sides of the interface are different, the work of stress forces on 


fluid 1 at a small virtual displacement or = nor of the interface (where n = dA/dA is the unit vector 
normal to the interface) equals!! 
SW = dAdr(P, —P,). (8.11) 


For equilibrium, this work has to be compensated by an equal change of the interface energy, dU; = 
yvAdA). Differential geometry tells us that in the linear approximation in or, the relative change of the 
elementary surface area, corresponding to a fixed solid angle dO, may be expressed as 


6(dA) _ or or 
dA RR,’ 


(8.12) 


where Rj» are the so-called principal radii of the interface curvature.!2 Combining Eqs. (10)-(12), we 
get the following Young-Laplace formula:'3 


(8.13) 


'1 This equality follows from the general relation (7.30), with the stress tensor elements expressed by Eq. (2), but 
in this simple case of the net stress force dF = (P, — 7)dA parallel to the interface element vector dA, it may be 
even more simply obtained just from the definition of work: 62/ = dF- 6r at the virtual displacement or = nor. 

12 This general formula may be readily verified for a sphere of radius r (for which R, = R, = r and dA = r'dQ, so 
that XdA\/dA = Kry/r’ = 26r/r), and for a round cylindrical interface of radius R (for which R; = r, Rp = ©, and 
dA = rdgdz, so that XdA)/dA =  or/r). For more on curvature, see, for example, M. do Camo, Differential 
Geometry of Curves and Surfaces, 2" ed., Dover, 2016. 

13 This result (not to be confused with Eq. (15), called Young’s equation) was derived in 1806 by Pierre-Simon 
Laplace (of the Laplace operator/equation fame) on the basis of the first analysis of the surface tension effects by 
Thomas Young (yes, the same Young who performed the famous two-slit experiment with light!) a year earlier. 
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In particular, this formula shows that the additional pressure created by surface tension inside a 
spherical drop of a liquid, of radius R, equals 27/R, i.e. decreases with R. In contrast, according to Eqs. 
(5)-(6), the pressure effects of bulk forces, for example gravity, grow as pgR. The comparison of these 
two pressure components shows that if the drop radius (or more generally, the characteristic linear size 
of a liquid’s sample) is much larger than the so-called capillary length 


(8.14) 


the surface tension may be safely ignored — as will be done in all following sections of this chapter, 
besides a brief discussion at the end of Sec. 4. For the water surface, or more exactly its interface with 
air at ambient conditions, y= 0.073 J/m*, while p* 1,000 kg/m’, so that a, 4mm. 


On the other hand, in very narrow tubes, such as blood capillary vessels with radius a ~ 1 um, 
i.e. a << ao, the surface tension effects are very important. The key notion for the analysis of these 
effects is the contact angle @, (also called the “wetting angle”) at an equilibrium edge of a liquid wetting 
a solid — see Fig. 5. 


(a) (b) 


Ne 


V se Fig. 8.5. Contact angles 
for (a) hydrophilic and 
(b) hydrophobic surfaces. 


Vs NS solid NS 


According to its definition (10), the constant y may be interpreted as a force (per unit length of 
the interface boundary) directed normally to the boundary, and “trying” to reduce the interface area. As 
a result, the balance of horizontal components of the three such forces, shown in Fig. 5a, immediately 


yields the Young’s equation 
Vat Vig COSO, = Veg > (8.15) 


where the indices of the three constants vy correspond to three possible interfaces between the liquid, 
solid, and gas. For the so-called hydrophilic surfaces that “like to be wet” by a particular liquid (not 
necessarily water), meaning that % < %g, this relation yields cos@, > 0, i.e. @& < m/2 — the situation 
shown in Fig. 5a. On the other hand, for hydrophobic surfaces with % > %, Eq. (15) yields larger 
contact angles, @ > 2/2 —see Fig. 5b. 


solid YS 


Let us use this notion to solve the simplest and perhaps the most practically important problem 
of this field — find the height 4 of the fluid column lifted by the surface tension forces in a narrow 
vertical tube made of a hydrophilic material, assuming its internal surface to be a round cylinder of 
radius a — see Fig. 6. Inside an incompressible fluid, pressure drops with height according to the Pascal 
equation (6), so that just below the surface, P= 7 — pgh, where ® is the background (e.g., atmospheric) 
pressure. This means that at a << h, the pressure variation along the concave surface (called the 
meniscus) of the liquid is negligible, so that according to the Young-Poisson equation (13), the sum 
(1/R; + 1/R2) has to be virtually constant along the surface. Due to the axial symmetry of the problem, 
this means that the surface has to be a part of a sphere. From the contact angle definition, the radius R of 
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the sphere is equal to a/cos@ — see Fig. 6. Plugging this relation into Eq. (3) with P| — = pgh, we get 


the following result for h: 


_ 2ycos0, 


pgh (8.16a) 


a 

In hindsight, this result might be obtained more directly — by requiring the total weight pgV = 
peg( ah) of the lifted liquid’s column to be equal to the vertical component Fcos@, of the full surface 
tension force F' = yp, acting on the perimeter p = 27a of the meniscus. Using the definition (11) of the 
capillary length a,, Eq. (16a) may be represented as the so-called Jurin rule: 


(8.16b) 


Fig. 8.6. Liquid’s rise in a vertical capillary tube. 


This capillary rise is the basic mechanism of lifting water with nutrients from roots to the 
branches and leaves of plants, so that the tallest tree heights correspond to the Jurin rule (16), with cos@, 
= 1, and the pore radius a limited from below by a few microns, because of the viscosity effects 
restricting the fluid discharge — see Sec. 5 below. 


8.3. Kinematics 


In contrast to the stress tensor, which is frequently very simple — see Eq. (2), the strain tensor is 
not a very useful notion in fluid mechanics. Indeed, besides a very few situations,!* typical problems of 
this field involve fluid flow, i.e. a state when the velocity of fluid particles has some nonzero time 
average. This means that the trajectory of each particle is a long line, and the very notion of its 
displacement q from the initial position becomes impracticable. However, the particle’s velocity v = 
dq/dt remains a very useful notion, especially if it is considered as a function of the observation point r 
and (generally) time ¢. In an important class of fluid dynamics problems, the so-called stationary (or 
“steady ”, or “static”) flow, the velocity defined in this way does not depend on time, v = v(r). 


'4 One of them is sound propagation, where the particle displacements q are typically small, so that the results of 
Sec. 7.7 are applicable. As a reminder, they show that in fluids, with = 0, the transverse sound cannot propagate, 
while the longitudinal sound can — see Eq. (7.114). 
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There is, however, a price to pay for the convenience of this notion: namely, due to the 
difference between the vectors q and r, the particle’s acceleration a = d’q/d?° (that participates, in 
particular, in the 2"' Newton law) cannot be calculated just as the time derivative of the velocity v(r, £). 
This fact is evident, for example, for the static flow case, in which the acceleration of individual fluid 
particles may be very significant even if v(1r) does not depend on time — just think about the acceleration 
of a drop of water flowing over the Niagara Falls’ rim, first accelerating fast and then virtually stopping 
below, while the water velocity v at every particular point, as measured from a bank-based reference 
frame, is nearly constant. Thus the primary task of fluid kinematics is to express a via v; let us do this. 


Since each Cartesian component v; of the velocity v has to be considered as a function of four 
independent scalar variables: three Cartesian components 7; of the vector r and time ¢, its full time 
derivative may be represented as 

dv, Ov; Ov, dr, 


J 
=— + ; 
a ot ar, dt 


j=l 


(8.17) 


Let us apply this general relation to a specific set of infinitesimal changes {dri, dr2, dr3} that follows a 
small displacement dq of a certain particle of the fluid: dr = dq = vdt, 1.¢. 
dr, =v jdt. (8.18) 


In this case, dvj/dt is the 7" component a; of the particle’s acceleration a, so that Eq. (17) yields the 
following key relation of fluid kinematics: 


(8.19) 


Using the del operator V, this result may be rewritten in the following compact vector form:!> 


a=Za(v-Vyy. (8.19b) 


This relation already signals the main technical problem of the fluid dynamics: many equations 
involving particle’s acceleration are nonlinear in velocity, excluding such a powerful tool as the linear 
superposition principle (which was used so frequently in the previous chapters of this course) from the 
applicable mathematical arsenal. 


One more basic relation of fluid kinematics is the so-called continuity equation, which is 
essentially just the differential version of the mass conservation law. Let us mark, inside a fluid flow, an 
arbitrary volume V limited by a stationary (time-independent) surface S. The total mass of the fluid 
inside the volume may change only due to its flow through the boundary: 


OF = {| pa’r =- pv,d?r=—[ pv-dA, (8.20a) 
dt aie Ss Ss 


!5 Note that the operator relation d/dt = 6/0t + (v-V) is applicable to an arbitrary (scalar or vector) function; it is 
frequently called the convective derivative. (Alternative adjectives, such as “Lagrangian”, “substantial”, or 
“Stokes”, are sometimes used for this derivative as well.) The relation has numerous applications well beyond the 
fluid dynamics — see, e.g., EM Chapter 9 and QM Chapter 1. 


Chapter 8 Page 8 of 30 


Fluid 
particle’s 
acceleration 


Essential Graduate Physics CM: Classical Mechanics 


where the elementary area vector dA is defined just as in Sec. 7.2 — see Fig. 7. 


Fig. 8.7. Deriving the continuity equation. 


Now using the same divergence theorem that has been used several times in this course,!® the 
surface integral in Eq. (20a) may be transformed into the integral of V(ev) over the volume JV, so that 
relation may be rewritten as 


j(2+v-sJa°r=o, (8.20b) 
“\ Ot 


where the vector j = pv is called either the mass flux density (or the “mass current’’). Since Eq. (20b) is 
valid for an arbitrary stationary volume V, the function under the integral has to vanish at any point: 
Continuit 
wines oF ip. (8.21) 
Ot 
Note that similar continuity equations are valid not only for mass but also for other conserved 
physics quantities (e.g., the electric charge, probability, etc.), with the proper re-definitions of p and j.'7 


8.4. Dynamics: Ideal fluids 


Let us start our discussion of fluid dynamics from the simplest case when the stress tensor obeys 
Eq. (2) even in motion. Physically, this means that the fluid viscosity effects, leading to mechanical 
energy loss, are negligible. (The conditions of this assumption will be discussed in the next section.) 
Then the equation of motion of such an ideal fluid (essentially the 2"' Newton law for its unit volume) 
may be obtained from Eq. (7.25) using the simplifications of its right-hand side, discussed in Sec. 1: 


pa=-VP +f. (8.22) 
Now using the basic kinematic relation (19), we arrive at the following Euler equation:!8 
‘jen pa + p(v-VW=-VP +E. (8.23) 


Generally, this equation has to be solved together with the continuity equation (21) and the 
equation of state of the particular fluid, p = p(?). However, as we have already discussed, in many 


16 If the reader still needs a reminder, see MA Eq. (12.1). 

'7 See, e.g., EM Sec. 4.1, QM Sec. 1.4, and SM Sec. 5.6. 

18 Tt was derived in 1755 by the same Leonhard Euler whose name has already been (reverently) mentioned 
several times in this course. 


Chapter 8 Page 9 of 30 


Essential Graduate Physics CM: Classical Mechanics 


situations the compressibility of water and other important liquids is very low and may be ignored, so 
that o may be treated as a given constant. Moreover, in many cases the bulk forces f are conservative 
and may be represented as a gradient of a certain potential function u(r) — the potential energy per unit 
volume: 

f=-Vu; (8.24) 


for example, for a uniform, vertical gravity field, u(r) = pgy, where y is referred to some (arbitrary) 
horizontal level. In this case, the right-hand side of Eq. (23) becomes -V(?+ u). For these cases, it is 


beneficial to recast the left-hand of that equation as well, using the following well-known identity of 
vector algebra!? 


2 
(v=o S| -vxlvy) (8.25) 
As a result, the Euler equation takes the following form: 
OV v 
pa PVx(Vxv)+V¥ P+ut p> =0. (8.26) 


In a stationary flow, the first term of this equation vanishes. If the second term, describing fluid’s 
vorticity, is zero as well, then Eq. (26) has the first integral of motion, 


Putty = const, (8.27) 


called the Bernoulli equation.2° Numerous examples of the application of Eq. (27) to simple problems of 
stationary flow in pipes, both with and without the Earth gravity field, should be well known to the 
readers from their undergraduate courses, so I hope I can skip their discussion without much harm. 


In the general case, an ideal fluid may have vorticity, so that Eq. (27) is not always valid. 
Moreover, due to the absence of viscosity in an ideal fluid, the vorticity, once created, does not decrease 
along the so-called streamline — the fluid particle’s trajectory, to which the velocity is tangential at every 
point.2! Mathematically, this fact2? 1s expressed by the following Kelvin theorem: (Vxv)-dA = const 
along any small contiguous group of streamlines crossing an elementary area dA.?3 


However, in many important cases, the vorticity is negligible. For example, even if the vorticity 
exists in some part of the fluid volume (say, induced by local turbulence, see Sec. 6 below), it may 
decay due to the fluid’s viscosity, to be discussed in Sec. 5, well before it reaches the region of our 
interest. (If this viscosity is sufficiently small, its effects on the fluid’s flow in the region of interest are 


19 Tt readily follows, for example, from MA Eq. (11.6) with g =f=v. 

20 Named after Daniel Bernoulli (1700-1782), not to be confused with Jacob Bernoulli or one of several Johanns 
of the same famous Bernoulli family, which gave the world so many famous mathematicians and scientists. 

21 Perhaps the most spectacular manifestation of the vorticity conservation is the famous toroidal vortex rings 
(see, e.g., a nice photo and a movie at https://en.wikipedia.org/wiki/Vortex_ring), predicted in 1858 by H. von 
Helmholtz, and then demonstrated by P. Tait in a series of spectacular experiments with smoke in the air. The 
persistence of such a ring, once created, is only limited by the fluid’s viscosity — see the next section. 

22 This theorem was first formulated (verbally) by Hermann von Helmholtz. 

23 Its proof may be found, e.g., in Sec. 8 of L. Landau and E. Lifshitz, Fluid Mechanics, 2™ ed., Butterworth- 
Heinemann, 1987. 
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negligible, i.e. the ideal-fluid approximation is still acceptable.) Another important case is when a solid 
body of an arbitrary shape is embedded into an ideal fluid whose flow is uniform (meaning, by 
definition, that v(r,t) = vo = const) at large distances,*4 its vorticity is zero everywhere. Indeed, since 
Vxv = 0 at the uniform flow, the vorticity is zero at distant points of any streamline, and according to 
the Kelvin theorem, should equal zero everywhere. 


In such cases, the velocity distribution, as any curl-free vector field, may be represented as a 
gradient of some effective potential function, 
v=-V¢. (8.28) 


Such potential flow may be described by a simple differential equation. Indeed, the continuity equation 
(21) for a steady flow of an incompressible fluid is reduced to V-v = 0. Plugging Eq. (28) into this 
relation, we get the scalar Laplace equation, 


V7¢=0, (8.29) 


which should be solved with appropriate boundary conditions. For example, the fluid flow may be 
limited by solid bodies, inside which the fluid cannot penetrate. Then the fluid velocity v at the solid 
body boundaries should not have a normal component; according to Eq. (28), this means 
ae. 
On 


On the other hand, if at large distances the fluid flow is known, e.g., uniform, then: 


= (8.30) 


surfaces 


Voe=-vV, =const, atr>o. (8.31) 


As the reader may already know (for example, from a course on electrodynamics?>), the Laplace 
equation (29) is analytically solvable in several simple (symmetric) but important situations. Let us 
consider, for example, the case of a round cylinder, with radius R, immersed into a flow with the initial 
velocity Vo perpendicular to the cylinder’s axis (Fig. 8). For this problem, it is natural to use the 
cylindrical coordinates, with the z-axis coinciding with the cylinder’s axis. In this case, the velocity 
distribution is obviously independent of z, so that we may simplify the general expression of the Laplace 
operator in cylindrical coordinates”® by taking 0/dz = 0. As a result, Eq. (29) is reduced to?’ 


2 
De sce + Et 3, atp = R. (8.32) 
pop\ op) p° 0d 


The general solution of this equation may be obtained using the variable separation method, similar to 
that used in Sec. 6.5 — see Eq. (6.67). The result is?8 


24 This case is very important, because the motion of a solid body, with a constant velocity u, in the otherwise 
stationary fluid, gives exactly the same problem (with vo = -u), in a reference frame bound to the body. 

25 See, e.g., EM Secs. 2.3-2.8. 

26 See, e.g., MA Eq. (10.3). 

27 Let me hope that the letter p, used here to denote the magnitude of the 2D radius vector p = {x, y}, will not be 
confused with the fluid’s density o — which does not participate in this boundary problem. 

28 See, e.g., EM Eq. (2.112). Note that the most general solution of Eq. (32) also includes a term proportional to 

gy, but in our geometry, this term should be zero for such a single-valued function as the velocity potential. 
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g=a,+b,Inp+ Se cosng+s, sin ng\a,p" +b,p™ } (8.33) 
n=1 
where the coefficients a, and b, have to be found from the boundary conditions (30) and (31). Choosing 
the x-axis to be parallel to the vector vo (Fig. 8a), so that x = pcosg, we may spell out these conditions 
in the following form: 


OO. eee (8.34) 
op 
g—>-v,pcospt+¢, atp>>R, (8.35) 


where @p is an arbitrary constant, which does not affect the velocity distribution and may be taken for 
zero. The condition (35) is incompatible with any term of the sum (33) except the term with n = | (with 
s,; = 0 and c\a,; =—%9), so that Eq. (33) is reduced to 


p -(- vopt 18 Joose (8.36) 
p 
Now, plugging this solution into Eq. (34), we get c:b; =—voR’, so that, finally, 
R? 
p= -n[ p+ Joose (8.37a) 
p 
J (a) (b) 


Xx 


Fig. 8.8. The flow of an ideal, incompressible fluid around a cylinder: (a) equipotential surfaces and 
(b) streamlines. 


Figure 8a shows the surfaces of constant velocity potential ¢ given by Eq. (37a). To find the 
fluid velocity, it is easier to rewrite that result in the Cartesian coordinates x = pcos@, y = psing: 


2 2 
go = “va + | = “va + es ; } (8.37b) 


xX" +y 
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From here, we may readily calculate the Cartesian components of the fluid’s velocity: 2° 


2 2 2 
v, Ee deen ie = [1-4 c0829)) 
Ox (x? +y°) p 


2 Pte 
whe v,R? Ws Vo — Sin 29. 


(8.38) 


y= _ 
y 2. 
: oy (x? + y?) 
These expressions show that the maximum fluid’s speed is achieved at the transverse diameter’s ends (p 


= R, p=+ 7/2), where v = 2vo, while at the longitudinal diameter’s ends (p = R, g = 0, +7), the velocity 
vanishes — the so-called stagnation points. 


Now the pressure distribution may be calculated by plugging Eqs. (38) into the Bernoulli 
equation (27) with u(r) = 0. The result shows that the pressure reaches its maximum at the stagnation 
points, while at the ends of the transverse diameter x = 0, where the velocity is largest, it is lower by 
2pvo . Note that the distributions of both the velocity and the pressure are symmetric with respect to the 
transverse axis x = 0, so that the fluid flow does not create any net drag force in its direction. It may be 
shown that this result, which stems from the conservation of the mechanical energy of an ideal fluid, 
remains valid for a solid body of arbitrary shape moving inside an infinite volume of an ideal fluid — the 
so-called D’Alembert paradox. However, if a body moves near an ideal fluid’s surface, its energy may 
be transformed into that of the surface waves, and the drag becomes possible. 


Speaking about the surface waves: the description of such waves in a gravity field3° is one more 
classical problem of the ideal fluid dynamics.*! Let us consider an open surface of an ideal liquid of 
density p in a uniform gravity field f = pg = -pgn, — see Fig. 9. 


Fig. 8.9. Small surface wave on a deep 
heavy liquid. Dashed lines show particle 


trajectories. (For clarity, the 
displacement amplitude A is strongly 
exaggerated.) 


If the wave amplitude A is sufficiently small, we may neglect the nonlinear term (v-V)v x A’ in 
the Euler equation (23) in comparison with the first term, Ov/Ot, which is linear in A. For a wave with 


29 Figure 8b shows the flow streamlines. They may be found by the integration of the obvious equation dy/dx = 
v,(x, y)/v,(x, y). For our simple problem, this may be done analytically, giving y(1 — R’/p’) = const, where the 
constant is specific for each streamline. 

30 The alternative, historic term “gravity waves” for this phenomenon may nowadays lead to confusion with the 
relativistic effect of gravity waves — which may propagate in free space. 

3! It was solved by Sir George Biddell Airy (1801-19892), of the Airy functions’ fame. (He was also a prominent 
astronomer and, in particular, established Greenwich as the prime meridian.) 
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frequency @ and wave number k, the particle’s velocity v = dq/dt is of the order of @4, so that this 
approximation is legitimate if @ A >> k(@A)’, i.e. when 


kA <<, (8.39) 


i.e. when the wave’s amplitude A is much smaller than its wavelength 2 = 2/k. Due to this assumption, 
we may neglect the liquid vorticity effects, and (for an incompressible fluid) again use the Laplace 
equation (29) for the wave’s analysis. Looking for its solution in the natural form of a sinusoidal wave, 
uniform in one of the horizontal directions (x), 


g= Re] & yell? - a (8.40) 
we get a very simple equation 
aD. = 
7% —-k°®=0, (8.41) 


with an exponential solution (properly decaying at y > -«), ® = Myexp {ky}, so that Eq. (40) becomes 
je Reo ele ‘ om = © ,e'” cos(kz — af), (8.42) 


where the last form is valid if ®4 is real — which may be always arranged by a proper selection of the 
origins of z and/or t. Note that the rate 4 of the wave’s decay in the vertical direction is exactly equal to 
the wave number of its propagation in the horizontal direction — along the fluid’s surface. Because of 
that, the trajectories of fluid particles are exactly circular — see Fig. 9. Indeed, using Eqs. (28) and (42) 
to calculate velocity components, 


v, =9, v= een Kb ,e” cos(kz — at), Vv, = Eas k® ,e” sin(kz—at), (8.43) 
Oy Oz 
we see that v, and v-, at the same height y, have equal real amplitudes, and are phase-shifted by 7/2. This 
result becomes even more clear if we use the velocity definition v = dq/dt to integrate Eqs. (43) over 
time to recover the particle displacement law q(¢). Due to the strong inequality (39), the integration may 
be done at fixed y and z: 


qy = qe” sin(kz - at), q.= qe” cos(kz - at), with g,= Lae ; (8.44) 
Oo 


Note that the phase of oscillations of v, coincides with that of g,. This means, in particular, that at the 


> 74 


wave’s “crest”, particles are moving in the direction of the wave’s propagation — see arrows in Fig. 9. 


It is remarkable that all this picture follows from the Laplace equation alone! The “only” 
remaining feature to calculate is the dispersion law @(k), and for that, we need to combine Eq. (42) with 
what remains, in our linear approximation, of the Euler equation (23). In this approximation, and with 
the bulk force potential u = pgy, the equation is reduced to 


v[-pb+P+ px)=0. (8.45) 
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This equality means that the function in the parentheses is constant in space; at the surface, and at 
negligible surface tension, it should be equal to the pressure 7 above the surface (say, the atmospheric 
pressure), which we assume to be constant. This means that on the surface, the contributions to ? that 
come from the first and the third term in Eq. (45) have to compensate for each other. Let us take the 
average surface position for y = 0; then the surface with waves is described by the relation y(z, 4) = q,(, 
z, t) — see Fig. 9. Due to the strong relation (39), we can use Eqs. (42) and (44) with y = 0, so that the 
above compensation condition yields 


— po® , sin(kz — wt) + pg Ea, sin(kz — at) = 0. (8.46) 
@ 


This condition is identically satisfied on the whole surface (and for any ®4) as soon as 


This equality is the dispersion relation we were looking for. Looking at this very simple result 
(which includes just one constant, g), note, first of all, that it does not involve the fluid’s density. This is 
not too surprising, because due to the weak equivalence principle, particle masses always drop out from 
the solutions of problems involving gravitational forces alone. Second, the dispersion law (47) is 
strongly nonlinear, and in particular, does not have an acoustic wave limit at all. This means that the 
surface wave propagation is strongly dispersive, with both the phase velocity up, = w/k = g/a@ and the 
group velocity ug = da/dk = g/2@= upp/2 diverging at o@ > 0.32 


This divergence is an artifact of our assumption of the infinitely deep liquid. A rather 
straightforward generalization of the above calculations to a layer of a finite thickness /, using the 
additional boundary condition v,|,-., = 0, yields a more general dispersion relation:*? 


@ = gk tanhkh. (8.48) 


It shows that relatively long waves, with 2 >> A, i.e. with kh << 1, propagate without dispersion (i.e. 
have w/k = const = u), with the following velocity: 


u=(gh)"”. (8.49) 


For the Earth’s oceans, this velocity is rather high, close to 250 m/s (!) for the average ocean depth h ~ 5 
km. This result explains, in particular, the very fast propagation of tsunami waves. 


In the opposite limit of very short waves (large k), Eq. (47) also does not give a good description 
of typical experimental data, due to surface tension effects — see Sec. 2 above. Using Eq. (13), it is easy 
(and hence also left for the reader’s exercise) to show that their account leads (at kh >> 1) to the 
following modification of Eq. (47): 


3 
ee ae (8.50) 
? 


32 Here, unlike in Chapters 6 and 7, the wave velocity is denoted by the letter u to avoid any chance of its 
confusion with the velocity v (43) of the liquid’s particles. 

33 This calculation (left for the reader’s exercise), shows also that at finite h, the particle trajectories are elliptical 
rather than circular, becoming more and more stretched in the wave propagation direction near the bottom of the 
layer. 
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According to this formula, the surface tension is important at wavelengths smaller than the capillary 
constant a, given by Eq. (14). Much shorter waves, for that Eq. (50) yields @ « ke", are called the 
capillary waves — or just “ripples”. 


8.5. Dynamics: Viscous fluids 


The viscosity of many fluids, at not overly high velocities, may be described surprisingly well by 
adding, to the static stress tensor (2), additional elements proportional to the velocity v = dq/dt: 


Cp = Pb +E, (¥)- (8.51) 


Ad 


In view of our experience with the Hooke’s law (7.32) expressing a stress tensor proportional to particle 
displacements q, we may expect a similar expression with the replacement q > v = dq/dt: 


g 1 1 
Gy =2n fe, - +5,TH) + aE 5,THe)} (8.52a) 
where e, are the elements of the symmetrized strain derivative tensor: 
fg et | ea (8.52b) 
x GE. 2OFe” 20r, 


Experiment confirms that Eq. (52) gives a good description of the viscosity effects in a broad range of 
isotropic fluids. The coefficient 7 is called either the shear viscosity, or the dynamic viscosity, or just 
viscosity, while ¢ is called the second (or bulk) viscosity. 


In the most frequent case of virtually incompressible fluids, Tr (e) = d[Tr (s)]/dt = (dV/dt)/V = 0, 
so that the term proportional to ¢ vanishes, and 77 is the only important viscosity parameter.*4 Table 1 
shows the approximate values of the viscosity, together with the mass density p, for several 
representative fluids. 


Table 8.1. Important parameters of several representative fluids (approximate values) 


Fluid (all at 300 K, until indicated otherwise) | 77 (mPa-s) p (kg/m*) 
Glasses 107'-10% | 2,200-2,500 
Earth magmas (at 800 to 1,400 K) 10*-10'4 | 2,200-2,800 
Machine oils (SAE 10W — 40 W) 65-320 900 
Water 0.89 1,000 
Mercury 1.53 13,530 
Liquid helium 4 (at 4.2K, 10° Pa) 0.019 130 

Air (at 10° Pa) 0.018 1.3 


34 Probably the most important effect we miss by neglecting ¢ is the attenuation of the (longitudinal) acoustic 
waves, into which the second viscosity makes a major contribution — whose (rather straightforward) analysis is 
left for the reader’s exercise. 
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One can see that 77 may vary in very broad limits; the extreme cases of fluids are glasses (which, 
somewhat counter-intuitively, are not stable solids even at room temperature, but rather may “flow”, 
though extremely slowly, until they eventually crystallize) and liquid helium (whose viscosity is of the 
order of that of gases,3> despite its much higher density). 


Incorporating the additional elements of oj to the equation (23) of fluid motion, absolutely 
similarly to how it was done at the derivation of Eq. (7.107) of the elasticity theory, and with the 
account of Eq. (19), we arrive at the famous Navier-Stokes equation:>® 


pXs ply. Vv=—vP +t eqvivel +2 \(V-4), (8.53) 


The apparent simplicity of this equation should not mask an enormous range of phenomena, 
notably including turbulence (see the next section) that are described by it, and the complexity of its 
solutions even for some simple geometries. In most problems interesting for practice, the only option is 
to use numerical methods, but due to a large number of parameters (p, 7, ¢, plus geometrical parameters 
of the involved bodies, plus the distribution of bulk forces f, plus boundary conditions), this way is 
strongly plagued by the curse of dimensionality that was discussed in the end of Sec. 5.8. 


Let us see how the Navier-Stokes equation works, on several simple examples. As the simplest 
case, let us consider the so-called Couette flow of an incompressible fluid layer between two wide, 
horizontal plates (Fig. 10), caused by their mutual sliding with a constant relative velocity vo. 


Fig. 8.10. The simplest problem of 
the viscous fluid flow. 


Let us assume a /aminar (vorticity-free) fluid flow. (As will be discussed in the next section, this 
assumption is only valid within certain limits.) Then we may use the evident symmetry of the problem, 
to take, in the coordinate frame shown in Fig. 10, v = n,v(y). Let the bulk forces be vertical, f = nf, so 
they do not give an additional drive to the fluid flow. Then for the stationary flow (Ov/ot = 0), the 
vertical, y-component of the Navier-Stokes equation is reduced to the static Pascal equation (6), showing 
that the pressure distribution is not affected by the plate (and fluid) motion. In the horizontal, z- 
component of the equation, only one term, V’v, survives, so that for the only Cartesian component of the 
fluid’s velocity we get the 1D Laplace equation 


—,~ =0. (8.54) 


35 Actually, at even lower temperatures (for He 4, at T< 7, ~ 2.17 K), helium becomes a superfluid, i.e. loses its 
viscosity completely, as a result of the Bose-Einstein condensation — see, e.g., SM Sec. 3.4. 

36 Named after Claude-Louis Navier (1785-1836) who had suggested the equation, and Sir George Gabriel Stokes 
(1819-1903) who has demonstrated its relevance by solving the equation for several key situations. 
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In contrast to the ideal fluid (see, e.g., Fig. 8b), the relative velocity of a viscous fluid and a solid 
wall it flows by should approach zero at the wall,’ so that Eq. (54) should be solved with boundary 
conditions 


0, aty=0, 
| ag (8.55) 


Vo, aty=d. 


Using the evident solution to this boundary problem, v(v) = (v/d)vo, illustrated by the arrows in Fig. 10, 
we can now calculate the horizontal drag force acting on a unit area of each plate. For the bottom plate, 


yay =. (8.56) 


(For the top plate, the derivative Ov/dy has the same value, but the sign of dA, has to be changed to 
reflect the direction of the outer normal to the solid surface so that we get a similar force but with the 
negative sign.) The well-known result (56) is often used, in undergraduate physics courses, for a 
definition of the dynamic viscosity 7, and indeed shows its meaning very well.?8 


As the next, slightly less trivial example let us consider the so-called Poiseuille problem:*° 
finding the relation between the constant external pressure gradient vy = -O7/0z applied along a round 
pipe with internal radius R (Fig. 11) and the so-called discharge OQ — defined as the mass of fluid flowing 
through the pipe’s cross-section in unit time. 


R  jower 


pressure 


higher 
pressure 


Fig. 8.11. The Poiseuille problem. 


Again assuming a laminar flow, we can involve the problem’s uniformity along the z-axis and its 
axial symmetry to infer that v = n.v(p), and P =-yz + f(p, v) + const (where p = {p, g} is again the 2D 
radius vector rather than the fluid density) so that the Navier-Stokes equation (53) for an incompressible 
fluid (with V-v = 0) is reduced to the following 2D Poisson equation: 


nViv=-y. (8.57) 


After spelling out the 2D Laplace operator in polar coordinates for our axially-symmetric case 0/0g = 0, 
Eq. (57) becomes a simple ordinary differential equation, 


37 This is essentially an additional experimental fact, but may be understood as follows. The tangential component 
of the velocity should be continuous at the interface between two viscous fluids, in order to avoid infinite stress — 
see Eq. (52), and solid may be considered as an ultimate case of fluid, with infinite viscosity. 

38 The very notion of viscosity 7 was introduced (by nobody other than the same Sir Isaac Newton) via a formula 
similar to Eq. (56), so that any effect resulting in a drag force proportional to velocity is frequently called the 
Newtonian viscosity. 

39 It was solved by G. Stokes in 1845 to explain the experimental results obtained by Gotthilf Hagen in 1839 and 
(independently) by Jean Poiseuille in 1840-41. 
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a LP 
pdp dp 


= (8.58) 


which has to be solved on the segment 0 < p < R, with the following boundary conditions: 


y=0, atp=R, 
a 0, at p=0. ooo 
dp 


(The latter condition is required by the axial symmetry.) A straightforward double integration yields: 
xX (p2_ 22 
v=—|R - 8.60 
ine} (8.60) 
so that the (easy) integration of the mass flow density over the cross-section of the pipe, 
R 
Q= | prd*r=2mp= [(R? —p*)pdp, (8.61) 
4 477 


immediately gives us the so-called Poiseuille (or “Hagen-Poiseuille’’) /aw for the fluid discharge: 
las QO==p2R‘ (8.62) 
8 1 
The most prominent (and practically important) feature of this result is the very strong dependence of 
the discharge on the pipe’s radius. 


Of course, the 2D Poisson equation (57) is so readily solvable not for each cross-section shape. 
For example, consider a very simple, square-shaped cross-section with side a (Fig. 12). 


Bs 


Fig. 8.12. Application of the finite-difference 
method with a very coarse mesh (with step h 
= a/2) to the problem of viscous fluid flow in 
a pipe with a square cross-section. 


0} 7 


In this case, it is natural to use the Cartesian coordinates aligned with the cross-section’s sides, 
so that Eq. (57) becomes 


2 2 
a # — const, forO< x,y <a, (8.63) 
Ox” oy 
and has to be solved with boundary conditions 
v=0, at x,y =0,a. (8.64) 
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For this boundary problem, analytical methods such as the variable separation lead to answers in 
the form of infinite sums (series), which ultimately require computers anyway — at least for their plotting 
and comprehension. Let me use this pretext to discuss how explicitly numerical methods may be used 
for such problems — or for any partial differential equations involving the Laplace operator. The simplest 
of them is the finite-difference method*? in which the function to be calculated, f(r), is represented by its 
values f(r), f(r2), ... in discrete points of a rectangular grid (frequently called mesh) of the 
corresponding dimensionality — Fig. 13. 


(b) 


Fig. 8.13. The idea of the finite- 
difference method in (a) one and 
(b) two dimensions. 


In Sec. 5.7, we have already discussed how to use such a grid to approximate the first derivative 
of the function — see Eq. (5.97). Its extension to the second derivative is straightforward — see Fig. 13a:4! 


ie CE Pop ic ean Lf.) Latah 
Or? Or \-ér,) hor * én!) hl ao h? 


J 


(8.65) 


The relative error of this approximation is of the order of ha‘flar;', quite acceptable in many cases. As a 
result, the left-hand side of Eq. (63), treated on a square mesh with step h (Fig. 13b), may be 
approximated with the so-called five-point scheme: 


2 2 

OE OY seve zs Mae ae Ay (8.66) 
Ox” oy h h h 

(The generalization to the seven-point scheme, appropriate for 3D problems, is straightforward.) Let us 
apply this scheme to the pipe with the square cross-section, using an extremely coarse mesh with step h 
= a/2, shown in Fig. 12. In this case, the fluid velocity v should equal zero at the walls, i.e. at all points 
of the five-point scheme except for the central point (in which the velocity obviously reaches its 
maximum), so that Eqs. (63) and (66) yield*? 


40 For more details see, e.g., R. Leveque, Finite Difference Methods for Ordinary and Partial Differential 
Equations, SIAM, 2007. 

41 As a reminder, at the beginning of Sec. 6.4 we have already discussed the reciprocal transition — from a similar 
sum to the second derivative in the continuous limit (4 > 0). 

42 Note that value (67) Of Vmax is exactly the same as given by the analytical formula (60) for the round cross- 
section with the radius R = a/2. This is not an occasional coincidence. The velocity distribution given by (60) is a 
quadratic function of both x and y. For such functions, with all derivatives higher than flor? equal to zero, 
equation (66) is exact rather than approximate. 
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0+04+040-4¥me  X aay gt ae’ 
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(8.67) 


This result for the maximal velocity is only ~20% different from the exact value. Using a slightly 
finer mesh with h = a/4, which gives a readily solvable system of three linear equations for three 
different velocity values (the exercise left for the reader), brings us within just a couple of percent from 
the exact result. So numerical methods may be practically more efficient than the “analytical” ones, 
even if the only available tool is a calculator app on your smartphone rather than an advanced computer. 


Of course, many practical problems of fluid dynamics do require high-performance computing, 
especially in conditions of turbulence with its complex, irregular spatial-temporal structure — see the 
next section). In such cases, the finite-difference approach discussed above may become unsatisfactory, 
because it implies the same accuracy of the derivative approximation through the whole area of interest. 
A more powerful (but also much more complex for implementation) approach is the finite-element 
method in which the discrete-point mesh is based on triangles with unequal sides, and is (in most cases, 
automatically) generated from the system geometry, giving many more mesh points at the location(s) of 
the highest gradients of the calculated function (Fig. 14), and hence a better calculation accuracy for the 
same total number of points. Unfortunately, I do not have time for going into the details of that method, 
so the interested reader is referred to the special literature on this subject. 


VETO 
ee 


> Fig. 8.14. A typical finite-element 
mesh generated automatically for a 
system with relatively complex 
geometry — a round cylindrical shell 
inside another one, with mutually 
perpendicular axes. (Adapted from 
the original by I.  Zureks, 
https://commons. wikimedia.org/w/in 


dex.php?curid=2358783, under the 
CC license BY-SA 3.0.) 


Before proceeding to our next topic, let me mention one more important problem that is 
analytically solvable using the Navier-Stokes equation: a slow motion of a solid sphere of radius R, with 
a constant velocity vo, through an incompressible viscous fluid — or equivalently, a slow flow of the 
fluid (uniform at large distances) around an immobile sphere. In the limit v > 0, the second term on the 
left-hand side of Eq. (53) is negligible (just as at the surface wave analysis in Sec. 3), the equation takes 
the form 


-VP+nV’v=0, for R<r<o, (8.68) 


43 T can recommend, e.g., C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element 
Method, Dover, 2009, or T. Hughes, The Finite Element Method, Dover, 2000. 
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and should be complemented with the incompressibility condition V-v = 0 and the boundary conditions 
v=0, atr=R, 
(8.69) 
Vv V,, atrro, 
In spherical coordinates, with the polar axis directed along the vector vo, this boundary problem has the 
axial symmetry (so that 0v/Og= 0 and vg= 0), and allows the following analytical solution: 


R R R R 3nv,R 
Vv, =v, cos@ as , V,g =—v, sind fe , P=- tls cos. (8.70) 
2r  2r r 2r 


Now calculating the tensor elements (52b) at r = R, using them to find the stress tensor elements from 
Eq. (52a), and integrating the elementary forces (7.18) over the surface of the sphere, it is 
straightforward to obtain the famous Stokes formula for the drag force acting on the sphere:*4 


F =6277 Ry. (8.71) 


For water drops with a 1-micron diameter, usually taken for the border between aerosols and droplets, 
descending in the ambient-condition air under their own weight, it predicts an equilibrium velocity v of 
close to 0.1 meter per hour, with the further scaling v « R?.45 (Note, however, that at R below ~10 um, 
corrections due to air molecule discreteness become noticeable.) 


For what follows in the next section, it is convenient to recast this result into the following form: 


Gas (8.72) 
Re 
where C4q is the drag coefficient defined as 
Ci = = (8.73) 
pv, A/2 


with A = 7R* being the sphere’s cross-section “as seen by the incident fluid flow”, and Re is the so- 
called Reynolds number.* In the general case, the number is defined as 
l 
Ream. (8.74) 
1 


where / is the linear-size scale of the problem, and v is its velocity scale. (In the particular case of Eq. 
(72) for the sphere, / is identified with the sphere’s diameter D = 2R, and v with vo). The physical sense 
of these two definitions will be discussed in the next section. 


44 This formula has played an important role in the first precise (better than 1%) calculation of the fundamental 
electric charge e by R. Millikan and H. Fletcher from their famous oil drop experiments in 1909-1913. 

45 These numbers are of key importance not only for the contagious disease transmission analysis, but also for 
many other fields including atmospheric physics. For example, for an average water droplet in clouds, with R ~ 10 
um, Eq. (71) (even with a due account of the slightly lower air viscosity at typical cloud heights) yields the 
equilibrium descent velocity of the order of 10 m/hr, substantiating the correct answer to the popular high-school 
question, “Why clouds do not fall?” (The answer is: each water droplet does descend, but so slowly that it has 
ample time to evaporate at the lower surface of the cloud, so that the cloud as a whole may maintain its height.) 

46 This notion was introduced in 1851 by the same G. Stokes but eventually named after O. Reynolds who 
popularized it three decades later. 
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8.6. Turbulence 


As Fig. 15 shows, the Stokes result (71)-(72) is only valid at Re << 1, while for larger values of 
the Reynolds number, i.e. at higher velocities vo, the drag force is larger. This very fact is not quite 
surprising, because at the derivation of the Stokes’ result, the nonlinear term (v-V)v in the Navier-Stokes 
equation (53), which scales as v’, was neglected in comparison with the linear terms, scaling as v. What 
is more surprising is that the function Cq(Re) exhibits such a complicated behavior over many orders of 
the velocity’s magnitude, giving a hint that the fluid flow at large Reynolds numbers should be also very 
complicated. Indeed, the reason for this complexity is a gradual development of very intricate, time- 
dependent fluid patterns, called turbulence, rich with vortices — for example, see Fig. 16. These vortices 
are especially pronounced in the region behind the moving body (the so-called wake), while the region 
before the body remains almost unperturbed. As Fig. 15 indicates, the turbulence exhibits rather 
different behaviors at various velocities (i.e. values of Re), and sometimes changes rather abruptly — see, 
for example, the significant drag’s drop at Re ~ 5x10”, 
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Fig. 8.15. The drag coefficient for a sphere and a thin round disk as functions of the Reynolds number. 
Adapted from F. Eisner, Das Widerstandsproblem, Proc. 3" Int. Cong. on Appl. Mech., Stockholm, 1931. 


In order to understand the conditions of this phenomenon, let us estimate the scale of various 
terms of the Navier-Stokes equation (53) for a generic body with characteristic size /, moving in an 
otherwise static incompressible fluid, with velocity v. In this case, the time scale of possible non- 
stationary phenomena is given by the ratio //v,47 so that we arrive at the following estimates: 


47 The time scale of phenomena in externally-driven systems may be different; for example, for forced oscillations 
with frequency @, it may be the oscillation period 7 = 2z/@. For such problems, the ratio S = (//v)/7, commonly 
called either the Strouhal number or the reduced frequency, serves as another dimensionless constant. 
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. p— plv-V)v f nV°v 
Equation term: ot 
2 2 (8.75) 
v v v 
Order of magnitude: ? 7 pe PS Ne 


(I have skipped the term V? because as we have seen in the previous section, in typical fluid flow 
problems, it balances the viscosity term, and hence is of the same order of magnitude.) 


uniform 
fluid flow 


Fig. 8.16. A snapshot of the turbulent tail (wake) behind a sphere moving in a fluid with a high Reynolds 
number, showing the so-called von Karman vortex street. Adapted from the original (actually, a very nice 
animation, http://www.mcef.ep.usp.br/staff/jmeneg/cesareo/vort2.gif) by Cesareo de La Rosa Siqueira, as 
a copyright-free material, available at https://commons.wikimedia.org/w/index.php?curid=87351. 


Estimates (75) show that the relative importance of the terms may be characterized by two 
dimensionless ratios.*8 The first of them is the so-called Froude number? 
2 2 
poe (8.76) 

psig 

which characterizes the relative importance of the gravity — or, upon appropriate modification, of other 

bulk forces. In most practical problems (with the important exception of surface waves, see Sec. 4 
above), F' >> 1 so that the gravity effects may be neglected. 


Much more important is another ratio, the Reynolds number (74), which may be rewritten as 

n  mv/P? 
and hence is a measure of the relative importance of the fluid particle’s inertia in comparison with the 
viscosity effects.5° So again, it is natural that for a sphere, the role of the vorticity-creating term (v-V)v 


2 
Foe ae (8.77) 


48 For substantially compressible fluids (e.g., gases), the most important additional dimensionless parameter is the 
Mach number M = v/v, where v, = (K/p)'” is the velocity of the longitudinal sound — which is, as we already 
know from Chapter 7, the only wave mode possible in an infinite fluid. Especially significant for practice are 
supersonic effects (including the shock wave in the form of the famous Mach cone with half-angle Oy = sin''M") 
that arise at M> 1. For a more thorough discussion of these issues, I have to refer the reader to more specialized 
texts — either Chapter IX of the Landau-Lifshitz volume cited above or Chapter 15 in I. Cohen and P. Kundu, 
Fluid Mechanics, 4" ed., Academic Press, 2007 — which is generally a good book on the subject. 

49 Named after William Froude (1810-1879), one of the applied hydrodynamics pioneers. 

50 Note that the “dynamic” viscosity 77 participates in this number (and many other problems of fluid dynamics) 
only in the combination 77/p, which thereby has deserved a special name of kinematic viscosity. 
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becomes noticeable already at Re ~ 1 — see Fig. 15. What is very counter-intuitive is the onset of 
turbulence in systems where the laminar (turbulence-free) flow is formally an exact solution to the 
Navier-Stokes equation for any Re. For example, at Re > Re, ~ 2,100 (with / = 2R and v = Vax) the 
laminar flow in a round pipe, described by Eq. (60), becomes unstable, and the resulting turbulence 
decreases the fluid discharge Q in comparison with the Poiseuille law (62). Even more strikingly, the 
critical value of Re is rather insensitive to the pipe wall roughness and does not diverge even in the limit 
of perfectly smooth walls. 


Since Re >> | in many real-life situations, turbulence is very important for practice. (Indeed, the 
values of 77 and p for water listed in Table 1 imply that even for a few-meter-sized object, such as a 
human body or a small boat, Re > 1,000 at any speed above just ~1 mm/s.) However, despite nearly a 
century of intensive research, there is no general, quantitative analytical theory of this phenomenon, and 
most results are still obtained either by rather approximate analytical treatments, or by the numerical 
solution of the Navier-Stokes equations using the approaches discussed in the previous section, or in 
experiments (e.g., on scaled models! in wind tunnels). A rare exception is the relatively recent 
theoretical result by S. Orszag (1971) for the turbulence threshold in a flow of an incompressible fluid 
through a gap of thickness ¢ between two parallel plane walls (see Fig. 10): Re; 5,772 (for / = ¢/2 and v 
= Vmax). However, even for this simplest geometry, the analytical theory still cannot predict the 
turbulence patterns at Re > Re;. Only certain general, semi-quantitative features of turbulence may be 
understood from simple arguments. 


For example, Fig. 15 shows that within a very broad range of Reynolds numbers, from ~10° to 
~3x10°, Cy of a thin round disk perpendicular to the incident flow, C, is very close to 1.1 for any Re > 
10°, and that of a sphere is not too far away. The approximate equality Cy ~ 1, meaning that the drag 
force F is close to pvo A/2, may be understood (in the picture where the object is moved by an external 
force F' with the velocity vo through a fluid that was initially at rest) as the equality of the force- 
delivered power Fvo and the fluid’s kinetic energy (pvo’/2)V created in volume V = v4 in unit time. This 
relation would be exact if the object gave its velocity vo to each and every fluid particle its cross-section 
runs into, for example by dragging all such particles behind itself. In reality, much of this kinetic energy 
goes into vortices, where the particle velocity may differ from vo, so that the equality Cy ~ 1 is only 
approximate. 


Another important general effect is that at very high values of Re, fluid flow at the leading 
surface of solid objects forms a thin, highly turbulent boundary layer that matches the zero relative 
velocity of the fluid at the surface with its substantial velocity in the outer region, which is almost free 
of turbulence and many cases, of other viscosity effects. This fact, clearly visible in Fig. 16, enables 
semi-quantitative analyses of several effects, for example, the so-called Magnus lift force’? F, exerted 
(on top of the usual drag force Fg) on rotating objects, and directed across the fluid flow — see Fig. 17. 


An even more important application of this concept is an approximate analysis of the forces 
exerted on non-rotating airfoils (such as aircraft wings) with special cross-sections forming sharp angles 
at their back ends. Such a shape minimizes the airfoil’s contacts with the vortex street it creates in its 


5! The crucial condition of correct modeling is the equality of the Reynolds numbers (74) (and if relevant, also of 
the Froude numbers and/or the Mach numbers) of the object of interest and its model. 

52 Named after G. Magnus, who studied this effect in detail in 1852, though it had been described much earlier (in 
1672) by I. Newton, and by B. Robins after him (in 1742). 


Chapter 8 Page 25 of 30 


Essential Graduate Physics CM: Classical Mechanics 


wake, and allows the thin boundary layer to extend over virtually all of its surface, enhancing the lift 
force. 


Fig. 8.17. The Magnus effect. 


Unfortunately, due to the time/space restrictions, for a more detailed discussion of these results I 
have to refer the reader to more specialized literature,°3 and will conclude this chapter with a brief 
discussion of just one issue: can turbulence be explained by a single mechanism? (In other words, can it 
be reduced, at least on a semi-quantitative level, to a set of simpler phenomena that are commonly 
considered “well understood?) Apparently the answer is no,°*4 though nonlinear dynamics of simpler 
systems may provide some useful insights. 


In the middle of the last century, the most popular qualitative explanation of turbulence had been 
the formation of an “energy cascade” that would transfer the energy from the regular fluid flow to a 
hierarchy of vortices of various sizes.5> With our background, it is easier to retell that story in the time- 
domain language (with the velocity v serving as the conversion factor), using the fact that in a rotating 
vortex, each Cartesian component of a particle’s radius vector oscillates in time, so that to some extent 
the vortex plays the role of an oscillatory motion mode. 


Let us consider the passage of a solid body between two, initially close, small parts of the fluid. 
The body pushes them apart, but after its passage, these partial volumes are free to return to their initial 
positions. However, the dominance of inertia effects at motion with Re >> 1 means that the volumes 
continue to oscillate for a while about those equilibrium positions. (Since elementary volumes of an 
incompressible fluid cannot merge, these oscillations take the form of rotating vortices — see Fig. 16 
again.) 


Now, from Sec. 5.8 we know that intensive oscillations in a system with the quadratic 
nonlinearity, in this case, provided by the convective term (v-V)v, are equivalent, for small 
perturbations, to the oscillation of the system’s parameters at the corresponding frequency. On the other 
hand, as was briefly discussed in Sec. 6.7, in a system with two oscillatory degrees of freedom, a 
periodic parameter change with frequency @, may lead to the non-degenerate parametric excitation 
(“down-conversion”) of oscillations with frequencies @) satisfying the relation @ + a = @. 
Moreover, the spectrum of oscillations in such a system also has higher combinational frequencies such 
as (@ + @), thus pushing the oscillation energy up the frequency scale (“up-conversion”). In the 
presence of other oscillatory modes, these oscillations may in turn produce, via the same nonlinearity, 
even higher frequencies, etc. In a fluid, the spectrum of these “oscillatory modes” (actually, vortex 


53 See, e.g., P. Davidson, Turbulence, Oxford U. Press, 2004. 

4 The following famous quote is attributed to Werner Heisenberg on his deathbed: “When I meet God, I will ask 
him two questions: Why relativity? And why turbulence? I think he will have an answer for the first question.” 
Though probably inaccurate, this story reflects rather well the frustration of the fundamental physics community, 
renown for their reductionist mentality, with the enormous complexity of phenomena that obey simple (e.g., the 
Navier-Stokes) equations, i.e. from their point of view, do not describe any new physics. 

55 This picture was suggested in 1922 by Lewis F. Richardson. 
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structures) is essentially continuous, so that the above arguments make very plausible a sequential 
transfer of the energy from the moving body to a broad range of oscillatory modes — whose frequency 
spectrum is limited from above by the energy dissipation due to the fluid’s viscosity. When excited, 
these modes interact (in particular, mutually phase-lock) via the system’s nonlinearity, creating the 
complex motion we call turbulence. 


Though not having much quantitative predictive power, such handwaving explanations, which 
are essentially based on the excitation of a /Jarge number of effective degrees of freedom, had been 
dominating the turbulence reviews until the mid-1960s. At that point, the discovery (or rather re- 
discovery) of quasi-random motions in classical dynamic systems with just a few degrees of freedom 
altered the discussion substantially. Since this phenomenon, called the deterministic chaos, extends well 
beyond the fluid dynamics, I will devote to it a separate (albeit short) next chapter, and in its end will 
briefly return to the discussion of turbulence. 


8.7. Exercise problems 


8.1. For a mirror-symmetric but otherwise arbitrary shape of a ship’s hull, derive an explicit 
expression for the height of metacenter M — see Fig. 3. Spell out this expression for a rectangular hull. 


8.2. Find the stationary shape of the open surface of an a ea 


incompressible, heavy fluid in a container rotated about its vertical axis with ey, 
a constant angular velocity @ — see the figure on the right. g | 


8.3. In the first order in the so-called flattening f= (Re — Rp)/Rp << 1 of the Earth (where R, and 
R, are, respectively, its equatorial and polar radii), calculate it within a simple model in that our planet is 
a uniformly-rotating nearly-spherical fluid ball, whose gravity field is dominated by a relatively small 
spherical core. Compare your result with the experimental value of f, and interpret the difference. 


Hint: You may use experimental values R, ~ 6,378 km, Rp ~ 6,357 km, and g = 9.807 m/s’. 


8.4.” Use two different approaches to calculate the stationary shape of the 
surface of an incompressible fluid of density p near a vertical plane wall, in a 
uniform gravity field — see the figure on the right. In particular, find the height h 
of liquid’s rise at the wall surface as a function of the contact angle @.. 


2R 
a 2 
8.5. A soap film with surface tension y is stretched between two 
similar, coaxial, thin, round rings of radius R, separated by distance d — see the 
figure on the right. Neglecting gravity, calculate the equilibrium shape of the qd 
film, and the external force needed for keeping the rings at this distance. 
nn “7 


Chapter 8 Page 27 of 30 


Essential Graduate Physics CM: Classical Mechanics 


8.6. A solid sphere of radius R is kept in a steady, vorticity-free flow of an ideal incompressible 
fluid, with velocity vo. Find the spatial distribution of velocity and pressure, and in particular their 
extreme values. Compare the results with those obtained in Sec. 4 for a round cylinder. 


8.7. Solve the same problem for a long and thin solid strip of width 2w, with its plane normal to 
the unperturbed fluid flow. 


Hint: You may like to use the so-called elliptic coordinates {,1, n} defined by their relations with 
the Cartesian coordinates {x, y}: 


x=Ccoshyucosv, y=Csinh wsinv, with O< u<o, -a<v<4_7, 


where C is a constant; in these coordinates, 


y? = 1 O° Hl oe 
C?(cosh* w—cos?v)\ du? dv?) 


8.8. A small source, located at distance d from a plain wall of a container Pp 
filled with an ideal, incompressible fluid of density p, injects additional fluid 
isotropically, at a constant mass current (“discharge”) O = dM/dt — see the figure d ] J 
on the right. Calculate the fluid’s velocity distribution, and its pressure on the are 
wall, created by the flow. | 


Hint: Recall the charge image method in electrostatics,°° and contemplate 
its possible analog. N 


8.9. Calculate the average kinetic, potential, and full energies (per unit area) of a traveling 
sinusoidal wave, of a small amplitude qg4, on the horizontal surface of an ideal, incompressible, deep 
fluid of density p, in a uniform gravity field g. 


8.10. Calculate the average power (per unit width of the wave’s front) carried by the surface 
wave discussed in the previous problem, and relate the result to the wave’s energy. 


8.11. Derive Eq. (48) for the surface waves on a finite-thickness layer of a heavy liquid. 


8.12. The utmost simplicity of Eq. (49) for the velocity of waves on a relatively shallow (h << 2) 


layer of an ideal incompressible liquid implies that they may be described using a very simple physical 
picture. Develop such a picture, and verify that it yields the same expression for the velocity. 


8.13. Use the solution of the previous problem to calculate the energy and power of the shallow- 


layer waves, and use the result to explain the high tides on some ocean shores, using two models: 


(1) the water depth / decreases gradually toward the shore, and 
(11) h decreases sharply, at some distance / from the shore — as it does on the ocean shelf border. 


8.14.” Derive a 2D differential equation describing the propagation of relatively long (A >> A) 
waves on the surface of a broad, plane layer of thickness h, of an ideal, incompressible fluid, and use it 


56 See, e.g., EM Secs. 2.9, 3.3, and 4.3. 
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to calculate the longest standing wave modes and frequencies in a layer covering a spherical planet of 
radius R >> h. 


Hint: The second task requires some familiarity with the basic properties of spherical 
harmonics.57 


8.15. Calculate the velocity distribution and the dispersion relation of the waves propagating 
along the horizontal interface of two ideal, incompressible fluids of different densities. 


8.16. Derive Eq. (50) for the capillary waves (“ripples”’). 

8.17. Use the finite-difference approximation for the Laplace operator, with the mesh step h = 
a/4, to find the maximum velocity and total mass flow Q of a viscous, incompressible fluid through a 
long pipe with a square-shaped cross-section of side a. Compare the results with those described in Sec. 
5 for the same problem with the mesh step h = a/2, and for a pipe with the circular cross-section of the 
same area. 


8.18. A layer, of thickness h, of a heavy, viscous, 
incompressible fluid flows down a long and wide inclined plane, 
under its own weight — see the figure on the right. Find the 
stationary velocity distribution profile, and the total fluid discharge 
(per unit width.) 


8.19. An external force moves two coaxial round disks of radius R, with an incompressible 
viscous fluid in the gap between them, toward each other with a constant velocity u. Calculate the 
applied force in the limit when the gap’s thickness ¢ is already much smaller than R. 


8.20. Calculate the drag torque exerted on a unit length of a solid round cylinder of radius R that 
rotates about its axis, with angular velocity @, inside an incompressible fluid with viscosity 7, kept static 
far from the cylinder. 


.21. Solve a similar problem for a sphere of radius R, rotating about one of its principal axes. 


8.22. Calculate the tangential force (per unit area) exerted by an incompressible fluid, with 
density p and viscosity 77, on a broad solid plane placed over its surface and forced to oscillate, along the 
surface, with amplitude a and frequency o. 


8.23. A massive barge, with a flat bottom of area 
A, floats in shallow water, with clearance h << yee 
see the figure on the right. Analyze the time dependence 
of the barge’s velocity V(f), and the water velocity 
profile, after the barge’s engine has been turned off. 
Discuss the limits of large and small values of the 
dimensionless parameter M/pAh. 


57 See, e.g., EM Sec. 2.8 and/or QM Sec. 3.6. 
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8.24.” Derive a general expression for mechanical energy loss rate in a viscous incompressible 
fluid that obeys the Navier-Stokes equation, and use this expression to calculate the attenuation 
coefficient of the surface waves, assuming that the viscosity is small. (Quantify this condition). 


8.25. Use the Navier-Stokes equation to calculate the coefficient of attenuation of a plane, 
sinusoidal acoustic wave. 


8.26. Use two different approaches for a semi-quantitative calculation of the Magnus lift force 
F; exerted by an incompressible fluid of density p on a round cylinder of radius R, with its axis normal 
to the fluid’s velocity vo, and rotating about the axis with an angular velocity w— see Fig. 17. Discuss 
the relation of the results. 
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Chapter 9. Deterministic Chaos 


This chapter gives a very brief review of chaotic phenomena in deterministic maps and dynamic systems 
with and without dissipation, and an even shorter discussion of the possible role of chaos in fluid 
turbulence. 


9.1. Chaos in maps 


The possibility of quasi-random dynamics of deterministic systems with a few degrees of 
freedom (nowadays called the deterministic chaos — or just “chaos”) had been noticed before the 20" 
century,! but became broadly recognized only after the publication of a 1963 paper by theoretical 
meteorologist Edward Lorenz. In that work, he examined numerical solutions of the following system of 
three nonlinear, ordinary differential equations, 


G, = 4,(q, — 4%), 


Gx = 499, — 92 — 93> (9.1) 


93 = 4192 — 4393, 


as a rudimentary model of heat transfer through a horizontal layer of fluid separating two solid plates. 
(Experiment shows that if the bottom plate is kept hotter than the top one, the fluid may exhibit 
turbulent convection.) He has found that within a certain range of the constants a). 3, the solution to Eq. 
(1) follows complex, unpredictable, non-repeating trajectories in the 3D q-space. Moreover, the 
functions g,(t) (where j = 1, 2,3) are so sensitive to initial conditions q,(0) that at sufficiently large times 
t, solutions corresponding to slightly different initial conditions become completely different. 


Very soon it was realized that such behavior is typical for even simpler mathematical objects 
called maps, so that I will start my discussion of chaos from these objects. A 1D map is essentially a rule 
for finding the next number g,+: of a discrete sequence numbered by the integer index n, in the simplest 
cases using only its last known value q,,. The most famous example is the so-called logistic map: 


Gn = F(9,) =19,0-4,)- (9.2) 


The basic properties of this map may be understood using its (hopefully, self-explanatory) 
graphical representation shown in Fig. 1.3 One can readily see that at r < 1 (Fig. la) the logistic map 
sequence rapidly converges to the trivial fixed point g‘” = 0 because each next value of q is less than the 
previous one. However, if 7 is increased above | (as in the example shown in Fig. 1b), the fixed point 


! Tt may be traced back at least to an 1892 paper by the same Jules Henri Poincaré who was already reverently 
mentioned in Chapter 5. Citing it: “...it may happen that small differences in the initial conditions produce very 
great ones in the final phenomena. [...| Prediction becomes impossible.” 

2 Its chaotic properties were first discussed in 1976 by Robert May, though the map itself is one of the simple 
ecological models repeatedly discussed much earlier, and may be traced back at least to the 1838 work by Pierre 
Francois Verhulst. 

3 Since the maximum value of the function f(g), achieved at g = %, equals 7/4, the mapping may be limited to 
segment x = [0, 1], if the parameter r is between 0 and 4. Since all interesting properties of the map, including 
chaos, may be found within these limits, I will discuss only this range of r. 
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g becomes unstable. Indeed, at gn << 1, the map yields gy+1 ¥ rqn, So that at r > 1, the values gn grow 
with each iteration. Instead of the unstable point g® = 0, in the range 1 < r < rj, where r; = 3, the map 
has a stable fixed point g"? that may be found by plugging this value into both parts of Eq. (2): 


v= so)era =a"), 2) 


giving g") = 1 — 1/r —see the leftmost branch of the plot shown in Fig. 2. 


v 


1.0 
0.8 
06 Fig. 9.2. The fixed points and 
; chaotic regions of the logistic 
q map. Adapted, under the CCO 
0.4 1.0 Universal Public Domain 
Dedication, from the original 
by Jordan Pierce, available at 
0.2 http://en. wikipedia.org/wiki/Lo 
gistic_ map. (A very nice live 
0.0 simulation of the map is also 


available on this website.) 
2.4 2.6 2.8 3.0 a2 3.4 3.6 3.8 4.0 


However, at r > r; = 3, the fixed point g“? also becomes unstable. To prove that, let us take 
q, =q° +q,, assume that the deviation 7, from the fixed point gq‘! is small, and linearize the map (2) 
in g, —just as we repeatedly did for differential equations earlier in this course. The result is 
df 


Cid = dq qz=q) Gn = r(l ~ 2q” dn = (2 ~ r) Gn . (9.4) 
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It shows that at0<2-—r <1l,ie.atl1<r <2, the deviations g, decrease monotonically. At -1<2-—r 
<0, ie. in the range 2 <r <3, the deviations’ sign alternates, but their magnitude still decreases — as in 
a stable focus, see Sec. 5.6. However, at -1 <2 —r, 1e.7r >r, = 3, the deviations grow by magnitude, 
while still changing their sign, at each step. Since Eq. (2) has no other fixed points, this means that at n 
— oo, the values g, do not converge to one point; rather, within the range r; < r < r2, they approach a 


limit cycle of alternation of two points, g: and g.”, which satisfy the following system of algebraic 
equations: 
=f) «= sla?) 0.) 


These points are also plotted in Fig. 2, as functions of the parameter r. What has happened at the point 7; 
= 3 is called the period-doubling bifurcation. 


The story repeats at r =r, = 1 + V6 ~ 3.45, where the system goes from the 2-point limit cycle to 
a 4-point cycle, then at r = r3 ~ 3.54, where the limit cycle begins to consist of 8 alternating points, etc. 
Most remarkably, the period-doubling bifurcation points 7,, at that the number of points in the limit 
cycle doubles from 2”! points to 2” points, become closer and closer to each other. Numerical 
simulations show that at n — o, these points obey the following asymptotic behavior: 


rr, =, where 7, = 3.5699..., 6 = 4.66972... (9.6) 


The parameter 6 is called the Feigenbaum constant; for other maps, and some dynamic systems (see the 
next section), period-doubling sequences follow a similar law, but with different values of 6. 


More important for us, however, is what happens at r > 7... Numerous numerical experiments, 
repeated with increasing precision,t have confirmed that here the system is disordered, with no 
reproducible limit cycle, though (as Fig. 2 shows) at r = 7, all sequential values q,, are still confined to a 
few narrow regions.> However, as parameter r is increased well beyond r., these regions broaden and 
merge. This is the so-called deep chaos, with no apparent order at all.® 


The most important feature of the chaos (in this and any other system) is the exponential 
divergence of trajectories. For a 1D map, this means that even if the initial conditions g; in two map 
implementations differ by a very small amount Aq, the difference Aq, between the corresponding 
sequences q, 1s growing, on average, exponentially with n. Such exponents may be used to characterize 
chaos. Indeed, an evident generalization of the linearized Eq. (4) to an arbitrary point q,, is 
(9.7) 


as: 


4 The reader should remember that just like the usual (“nature”) experiments, numerical experiments also have 
limited accuracy, due to unavoidable rounding errors. 

5 The geometry of these regions is essentially fractal, i.e. has a dimensionality intermediate between 0 (which any 
final set of geometric points would have) and | (pertinent to a 1D continuum). An extensive discussion of fractal 
geometries and their relation to the deterministic chaos may be found, e.g., in the book by B. Mandelbrot, The 
Fractal Geometry of Nature, W.H. Freeman, 1983. 

© This does not mean that chaos’ depth is always a monotonic function of r. As Fig. 2 shows, within certain 
intervals of this parameter, the chaotic behavior suddenly disappears, being replaced, typically, with a few-point 
limit cycle, just to resume on the other side of the interval. Sometimes (but not always!) the “route to chaos” on 
the borders of these intervals follows the same Feigenbaum sequence of period-doubling bifurcations. 
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Let us assume that Ag; is so small that N first values g, are relatively close to each other. Then using Eq. 


(7) iteratively for these steps, we get 


Aq y 
Aq 


N 
Aq, = Aq, Ile, ; so that In 


n=l 


N 
=n 


n=l 


e, |. 


(9.8) 


Numerical experiments show that in most chaotic regimes, at N + oo such a sum fluctuates about 


an average, which grows as AN, with the parameter 


; 1< 
A= lim yg 49 tim y_.p yon e, |» (9.9) 


n=l 


called the Lyapunov exponent,’ being independent of the initial conditions. The bottom panel in Fig. 3 
shows / as a function of the parameter r for the logistic map (2). (Its top panel shows the same pattern 


as Fig. 2, which is reproduced here just for the sake of comparison.) 


q 
A 4 Z 
| ‘ 
| rt ve N | | 
Pi | 
i | 
0 am 2 te | 4 4 Fig. 9.3. The Lyapunov exponent 
7 ° J\ /\ roid! | ‘for the logistic map. Adapted, with 
ae ‘ f \ | permission, from the monograph by 
\ jf \ | Schuster and Just (cited below). © 
\ { \ Wiley-VCH Verlag GmbH & Co. 
: / | | KGaA. 
N 


ote that at r<r.,, A is negative, indicating the sequence’s stability, besides the points 71, 72, 
where A would become positive if the limit cycle changes (bifurcations) had not brought it back into the 
negative territory. However, at r > 7., 2 becomes positive, returning to negative values only in limited 
intervals of stable limit cycles. It is evident that in numerical experiments (which dominate the studies 
of deterministic chaos) the Lyapunov exponent may be used as a good measure of the chaos’ depth.® 


7 After Alexandr Mikhailovich Lyapunov (1857-1918), famous for his studies of the stability of dynamic systems. 
8 N-dimensional maps that relate N-dimensional vectors rather than scalars, may be characterized by N Lyapunov 
exponents rather than one. For chaotic behavior, it is sufficient for just one of them to become positive. For such 


systems, another measure of chaos, the Kolmogorov entropy, may be more relevant. This measure and its relation 
with the Lyapunov exponents are discussed, for example, in SM Sec. 2.2. 
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Despite the abundance of results published for particular maps,? and several interesting 
observations (like the already discussed existence of the Feigenbaum bifurcation sequences), to the best 
of my knowledge, nobody can yet predict the patterns like those shown in Fig. 2 and 3 from just 
studying the mapping rule itself, i.e. without carrying out actual numerical experiments. Unfortunately, 
the understanding of deterministic chaos in other systems is not much better. 


9.2. Chaos in dynamic systems 


Proceeding to the discussion of chaos in dynamic systems, it is more natural, with our 
background, to illustrate this discussion not with the Lorenz equations, but with the system of equations 
describing a dissipative pendulum driven by a sinusoidal external force, which was repeatedly discussed 
in Chapter 5. Introducing two new variables, the normalized momentum p = (dq/dt)/q@p, and the external 
force’s full phase y= ot, we may rewrite Eq. (5.42), describing the pendulum, in a form similar to Eq. 
(1), i.e. as a system of three first-order ordinary differential equations: 


q = QP. 
p= -@, sing —26p +(f,/@,) cosy, (9.10) 
Y=. 


Figure 4 shows several results of a numerical solution of Eq. (10).!° In all cases, parameters 6, 
@, and fo are fixed, while the external frequency wis gradually changed. For the case shown on the top 
two panels, the system still tends to a stable periodic solution, with very low contents of higher 
harmonics. If the external force frequency is reduced by a just few percent, the 3™ subharmonic may be 
excited. (This effect has already been discussed in Sec. 5.8 — see, e.g., Fig. 5.15.) The next row shows 
that just a small further reduction of the frequency @ leads to a new tripling of the period, i.e. the 
generation of a complex waveform with the 9" subharmonic. Finally (see the bottom panels of Fig. 4), 
even a minor further change of @ leads to oscillations without any visible period, e.g., to the chaos. 


In order to trace this transition, a direct inspection of the oscillation waveforms g(f) is not very 
convenient, and trajectories on the phase plane [q, p] also become messy if plotted for many periods of 
the external frequency. In situations like this, the Poincaré (or “stroboscopic’”’) plane, already discussed 
in Sec. 5.6, is much more useful. As a reminder, this is essentially just the phase plane [g, p], but with 
the points highlighted only once a period, e.g., at wy = 22m, with n = 1, 2, ... On this plane, periodic 
oscillations of frequency @ are represented just as one fixed point — see, e.g. the top panel in the right 
column of Fig. 4. The 3 subharmonic generation, shown on the next panel, means the oscillation 
period’s tripling and is represented as the splitting of the fixed point into three. It is evident that this 
transition is similar to the period-doubling bifurcation in the logistic map, besides the fact (already 
discussed in Sec. 5.8) that in systems with an antisymmetric nonlinearity, such as the pendulum (10), the 
3™ subharmonic is easier to excite. From this point, the 9"" harmonic generation (shown on the 3™ panel 
of Fig. 4), i.e. one more splitting of the points on the Poincaré plane, may be understood as one more 
step on the Feigenbaum-like route to chaos — see the bottom panel of that figure. 


9 See, e.g., Chapters 2-4 in H. Schuster and W. Just, Deterministic Chaos, 4" ed., Wiley-VCH, 2005, or Chapters 
8-9 in J. Thompson and H. Stewart, Nonlinear Dynamics and Chaos, 2" ed., Wiley, 2002. 

10 Tn the actual simulation, a small term «g, with ¢<< 1, has been added to the left-hand side of this equation. This 
term slightly tames the trend of the solution to spread along the g-axis, and makes the presentation of results 
easier, without affecting the system’s dynamics too much. 
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Fig. 9.4. Oscillations in a pendulum with weak damping, 6/w, = 0.1, driven by a sinusoidal external 
force with a fixed effective amplitude fo/@’ = 1, and several close values of the frequency @ (listed on 
the panels). Left panel column: the oscillation waveforms g(t) recorded after certain initial transient 
intervals. Right column: representations of the same processes on the Poincaré plane of the variables 
[g, p], with the g-axis turned vertically, for the convenience of comparison with the left panels. 
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So, the transition to chaos in dynamic systems may be at least qualitatively similar to that in 1D 
maps, with a law similar to Eq. (6) for the critical values of some parameter of the system (in Fig. 4, 
frequency @), though with a system-specific value of the coefficient 6. Moreover, we may consider the 
first two differential equations of the system (10) as a 2D map that relates the vector {qy+1, Pati} of the 
coordinate and momentum, measured at y = 22(n + 1), with the previous value {qn, pn} of that vector, 
reached at y = 22m. 


Unfortunately, this similarity also implies that the deterministic chaos in dynamic systems is at 
least as complex, and is as little understood, as in maps. For example, Fig. 5 shows (a part of) the phase 
diagram of the externally-driven pendulum, with the red bar marking the route to chaos traced in Fig. 4, 
and shading/hatching styles marking different oscillation regimes. One can see that the pattern is at least 
as complex as that shown in Figs. 2 and 3, and, besides a few features,!! is equally unpredictable from 
the form of the equation. 


[7 
hy2 
ao 41.0 
lV. Fig. 9.5. The phase diagram of an externally-driven 
i 40.8 pendulum with weak damping (6/q@ = 0.1). The 
iy. At regions of oscillations with the basic period are not 
oO Ny 06 shaded; the notation for other regions is as follows. 
or i a Doted: subharmonic generation; cross-hatched: 
g chaos; hatched: either chaos or the basic period 
F 404 (depending on the initial conditions); hatch-dotted: 
either the basic period or subharmonics. Solid lines 
| J a2 show the boundaries of single-regime regions, 
while dashed lines are the boundaries of the regions 
where several types of motion are possible. (Figure 
i 02 ae 06 0 3 1 0 courtesy by V. Kornev.) 


Are there any valuable general results concerning the deterministic chaos in dynamic systems? 
The most important (though an almost evident) result is that this phenomenon is impossible in any 
system described by one or two first-order differential equations with time-independent right-hand sides. 
Indeed, let us start with a single equation 


q =f (9), (9.11) 
where f(q) is any single-valued function. This equation may be directly integrated to give 
q ’ 
t=/ wi + const, (9.12) 
f(7‘) 


showing that the relation between q and ¢ is unique and hence does not leave any place for chaos. 


'1 Tn some cases, it is possible to predict a parameter region where chaos cannot happen, due to the lack of any 
instability-amplification mechanism. Unfortunately, typically the analytically predicted boundaries of such a 
region form a rather loose envelope of the actual (numerically simulated) chaotic regions. 
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Next, let us explore a system of two such equations: 


G, = F(G292)s 


9.13 
Go = Sy(91492)- 


Consider its phase plane shown schematically in Fig. 6. In a “usual” system, the trajectories approach 
either some fixed point (Fig. 6a) describing static equilibrium, or a limit cycle (Fig. 6b) describing 
periodic oscillations. (Both notions are united by the term attractor because they “attract” trajectories 
launched from various initial conditions.) On the other hand, phase plane trajectories of a chaotic system 
of equations that describe physical variables (which cannot be infinite), should be confined to a limited 
phase plane area, and simultaneously cannot start repeating each other. (This topology is frequently 
called the strange attractor.) For that, the 2D trajectories need to cross — see, e.g., point A in Fig. 6c. 


(b) 


q> (a) qo UN 


Fig. 9.6. Attractors in dynamical systems: (a) a fixed point, (b) a limit cycle, and (c) a strange attractor. 


However, in the case described by Eqs. (13), such a crossing is clearly impossible, because 
according to these equations, the tangent of a phase plane trajectory is a unique function of the 
coordinates {q1, q2}: 


dq, _ AQ42) 
dq, f3(9142) 


Thus, in this case, the deterministic chaos is impossible.!? It becomes, however, readily possible if the 
right-hand sides of a system similar to Eq. (13) depend either on other variables of the system or time. 
For example, if we consider the first two differential equations of the system (10), in the case fo = 0 they 
have the structure of the system (13), and hence the chaos is impossible — even at 6 < 0 when (as we 
know from Sec. 5.4) the system allows self-excitation of oscillations, leading to a limit-cycle attractor. 
However, if fo 4 0, this argument does not work any longer, and (as we have already seen) the system 
may have a strange attractor — which is, for dynamic systems, a synonym for the deterministic chaos. 


(9.14) 


Thus, chaos is only possible in autonomous dynamic systems described by three or more 
differential equations of the first order.!3 


12 A mathematically strict formulation of this statement is called the Poincaré-Bendixon theorem, which was 
proved by Ivar Bendixon in 1901. 

13 Since a typical dynamic system with one degree of freedom is described by two such equations, the number of 
first-order equations describing a dynamic system is sometimes called the number of its half-degrees of freedom. 
This notion is very useful and popular in statistical mechanics — see, e.g., SM Sec. 2.2 and on. 
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9.3. Chaos in Hamiltonian systems 


The last conclusion is of course valid for Hamiltonian systems, which are just a particular type of 
dynamic systems. However, one may wonder whether these systems, that feature at least one first 
integral of motion, H = const, and hence are more “ordered” than the systems discussed above, can 
exhibit chaos at all. The answer is yes because such systems still can have mechanisms for the 
exponential growth of a small initial perturbation. 


As the simplest way to show it, let us consider the so-called mathematical billiard, i.e. system 
with a ballistic particle (a “ball”) moving freely by inertia on a horizontal plane surface (“table”) limited 
by rigid walls. In this idealized model of the usual game of billiards, the ball’s velocity v is conserved 
when it moves on the table, and when it runs into a wall, the ball is elastically reflected from it as from a 
mirror,!4 with the reversal of the sign of the normal velocity v,, and the conservation of the tangential 
velocity v, and hence without any loss of its kinetic (and hence the full) energy 


E=H=T=Tv =F tv). (9.15) 


This model, while being a legitimate 2D dynamic system,'> allows geometric analyses for several simple 
table shapes. The simplest of them is a rectangular billiard of area axb (Fig. 7), whose analysis may be 
readily carried out just by the replacement of each ball reflection event with the mirror reflection of the 
table in that wall — see the dashed lines on panel (a). 


oF Fig. 9.7. Ball motion on 


y a rectangular billiard at 
(a) a commensurate, and 


(b) an incommensurate 
launch angle. 
{1 \ ae 


Such analysis (left for the reader’ pleasure :-) shows that if the tangent of the ball launching 
angle gis commensurate with the side length ratio: 


tang=+m (9.16) 
na 
where n and m are non-negative integers without common integer multipliers, the ball returns exactly to 
the launch point O, after bouncing m times from each wall of length a, and n times from each wall of 
length b. (Red lines in Fig. 7a show an example of such a trajectory for n = m = 1, while blue lines, for 
m = 3,n=1.) The larger is the sum (m + n), the more complex is such a closed “orbit”. 


'4 A more scientific-sounding name for such a reflection is specular — from the Latin word “speculum” meaning a 
metallic mirror. 

15 Indeed, it is fully described by the following Lagrangian function: L = mv’/2 — U(p), with U(p) = 0 for the 2D 
radius vectors p belonging to the table area, and U(p) = +o outside the area. 
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Finally, if (7 + m) > ©, Le. tang and b/a are incommensurate (meaning that their ratio is an 
irrational number), the trajectory covers all of the table area, and the ball never returns exactly to the 
launch point. Still, this is not genuine chaos. Indeed, a small shift of the launch point O shifts all the 
trajectory fragments by the same displacement. Moreover, at any time ¢, each of Cartesian components 
vt) of the ball’s velocity (with coordinate axes parallel to the table sides) may take only two values, 
+v0), and hence may vary only as much as the initial velocity is being changed. 


In 1963, i.e. well before E. Lorenz’s work, Yakov Sinai showed that the situation changes 
completely if an additional wall, in the shape of a circle, is inserted into the rectangular billiard (Fig. 8). 
For most initial conditions, the ball’s trajectory eventually runs into the circle (see the red line on panel 
(a) as an example), and the further trajectory becomes essentially chaotic. Indeed, let us consider the 
ball’s reflection from a circle-shaped wall — Fig. 8b. Due to the conservation of the tangential velocity, 
and the sign change of the normal velocity component, the reflection obeys a simple law: 6. = @. Figure 
8b shows that as the result, the magnitude of a small difference dg between the angles of two close 
trajectories (as measured in the lab system), doubles at each reflection from the curved wall. This means 
that the small deviation grows along the ball trajectory as 


|So(N)| ~|og(0)|x2” =|5e(0)/e% ™?, (9.17) 


where WN is the number of reflections from the convex wall.'!© As we already know, such exponential 
divergence of trajectories, with a positive Lyapunov exponent, is the main feature of deterministic 
chaos. !7 


(b) 


IS 


Fig. 9.8. (a) Motion on a Sinai 
billiard table, and (b)_ the 
mechanism of the exponential 
divergence of close trajectories. 


The most important new feature of the dynamic chaos in Hamiltonian systems is its dependence 
on initial conditions. (In the systems discussed in the previous two sections, that lack the integrals of 
motion, the initial conditions are rapidly “forgotten”, and the chaos is usually characterized after an 
initial transient period — see, e.g., Fig. 4.) Indeed, even a Sinai billiard allows periodic motion, along 
closed orbits, under certain initial conditions — see the blue and green lines in Fig. 8a as examples. Thus 


16 Superficially, Eq. (17) is also valid for a plane wall, but as was discussed above, a billiard with such walls 
features a full correlation between sequential reflections, so that angle g always returns to its initial value. In a 
Sinai billiard, such correlation disappears. Concave walls may also make a billiard chaotic; a famous example is 
the stadium billiard, suggested by Leonid Bunimovich in 1974, with two straight, parallel walls connecting two 
semi-circular, concave walls. Another example, which allows a straightforward analysis (first carried out by 
Martin Gutzwiller in the 1980s), is the so-called Hadamard billiard: an infinite (or rectangular) table with a non- 
horizontal surface of negative curvature. 

'7 Curved-wall billiards are also a convenient platform for studies of quantum properties of classically chaotic 
systems (for their conceptual discussion, see QM Sec. 3.5), in particular, the features called “quantum scars” — 
see, e.g., the spectacular numerical simulation results by E. Heller, Phys. Rev. Lett. 53, 1515 (1984). 
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the chaos “depth” in such systems may be characterized by the “fraction’’!® of the phase space of initial 
parameters (for a 2D billiard, of the 3D space of the initial values of x, y, and @) resulting in chaotic 
trajectories. 


This conclusion is also valid for Hamiltonian systems that are met in physics much more 
frequently than exotic billiards, for example, coupled nonlinear oscillators without damping. Perhaps the 
earliest and the most popular example is the so-called Hénon-Heiles system,!° which may be described 
by the following Lagrangian function: 


mM, (. mM, (. 1 
is =F? — ogi )+ (a3 34) 4 34 Jo (9.18) 


It is straightforward to use this function to derive the corresponding Lagrange equations of motion, 


mM, (4, + 074q,)= 289,93; 


: (9.19) 
m,(q, + @3q,)=-e(¢? -42), 
and find their first integral of motion (physically, the energy conservation law): 
Mm, (. My (. 1 
H=E= si +arq?)+ sé +03q3)+ Gi 3% he = const. (9.20) 


In the context of our discussions in Chapters 5 and 6, Eqs. (19) may be readily interpreted as 
those describing two oscillators, with small-oscillation frequencies @ and @, coupled only by the 
quadratic terms on the right-hand sides of the equations. This means that as the oscillation amplitudes 
Aj 2, and hence the total energy F of the system, are close to zero, the oscillator subsystems are virtually 
independent, each performing sinusoidal oscillations at its own frequency. This observation suggests a 
convenient way to depict the system’s motion.2° Let us consider a Poincaré plane for one of the 
oscillators (say, with coordinate gz), similar to that discussed in Sec. 2 above, with the only difference is 
that (because of the absence of an explicit function of time in the system’s equations), the trajectory on 
the phase plane [ q¢,,q, | is highlighted at the moments when q, = 0. 


Let us start from the limit 4; — 0 when the oscillations of gz are virtually sinusoidal. As we 
already know (see Fig. 5.9 and its discussion), if the representation point highlighting was perfectly 
synchronous with frequency @ of the oscillations, there would be only one point on the Poincaré plane 
— see, e.g. the right top panel of Fig. 4. However, at the qg; — initiated highlighting, there is no such 
synchronism, so that each period, a different point of the elliptical (at the proper scaling of the velocity, 


18 Actually, quantitative characterization of the fraction is not trivial, because it may have fractal dimensionality. 
Unfortunately, due to lack of time I have to refer the reader interested in this issue to special literature, e.g., the 
monograph by B. Mandelbrot (cited above) and references therein. 

19 Tt was first studied in 1964 by M. Hénon and C. Heiles as a simple model of star rotation about a galactic 
center. Most studies of this equation have been carried out for the following particular case: m, = 2m), m@,"= 
ma, . In this case, by introducing new variables x = aq), y= €q2, and T= at, it is possible to rewrite Eqs. (19) in 
a parameter-free form. All the results shown in Fig. 9 below are for this case. 

20 Generally, the system has a trajectory in 4D space, e.g., that of coordinates g;» and their time derivatives, 
although the first integral of motion (20) means that for each fixed energy £, the motion is limited to a 3D 
subspace. Still, this is one dimension too many for a convenient representation of the motion. 
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circular) trajectory is highlighted, so that the resulting points, for certain initial conditions, reside on a 
circle of radius A». If we now vary the initial conditions, i.e. redistribute the initial energy between the 
oscillators, but keep the total energy F constant, on the Poincaré plane we get a set of ellipses. 


Now, if the initial energy is increased, the nonlinear interaction of the oscillations starts to 
deform these ellipses, causing also their crossings — see, e.g., the top left panel of Fig. 9. Still, below a 
certain threshold value of E, all Poincaré points belonging to a certain initial condition sit on a single 
closed contour. Moreover, these contours may be calculated approximately, but with pretty good 
accuracy, using straightforward generalization of the method discussed in Sec. 5.2.7! 


" n 4 
-04 -03 -02 -O.1 


Fig. 9.9. Poincaré planes of the Hénon- 
Heiles system (19), in notation y = €q2, for three 
values of the dimensionless energy e = E/E, 
with Ey = mo/é. Adapted from M. 
Hénon and C. Heiles, The Astron. J. 69, 73 
(1964). © AAS, reproduced with permission. 


Se a 
o. ol 02 03° 04 05° 06 O07 O08 


sy 


However, starting from some value of energy, certain initial conditions lead to sequences of 
points scattered over parts of the Poincaré plane, with a nonzero area — see the top right panel of Fig. 9. 
This means that the corresponding oscillations q2(t) do not repeat from one (quasi-) period to the next 
one — cf. Fig. 4 for the dissipative, forced pendulum. This is chaos.22 However, some other initial 


21 See, e.g., M. Berry, in: S. Jorna (ed.), Topics in Nonlinear Dynamics, AIP Conf. Proc. No. 46, AIP, 1978, pp. 
16-120. 


22 This fact complies with the necessary condition of chaos, discussed at the end of Sec. 2, because Eqs. (19) may 
be rewritten as a system of four differential equations of the first order. 
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conditions still lead to closed contours. This feature is similar to that in Sinai billiards, and is typical for 
Hamiltonian systems. As the energy is increased, larger and larger parts of the Poincaré plane 
correspond to the chaotic motion, signifying deeper and deeper chaos — see the bottom panel of Fig. 9. 


9.4. Chaos and turbulence 


This extremely short section consists of essentially just one statement, extending the discussion 
in Sec. 8.5. The (re-) discovery of the deterministic chaos in systems with just a few degrees of freedom 
in the 1960s has changed the tone of the debates concerning turbulence origins, very considerably. At 
first, an extreme point of view that equated the notions of chaos and turbulence, became the debate’s 
favorite.23 However, after the initial excitement, a significant role of the Richardson-style energy- 
cascade mechanisms, involving many degrees of freedom, has been rediscovered and could not be 
ignored any longer. To the best knowledge of this author, who is a very distant albeit interested observer 
of that field, most experimental and numerical-simulation data carry features of both mechanisms, so 
that the debate continues.*4 Due to the age difference, most readers of these notes have much better 
chances than the author to see where would this discussion eventually lead.25 


9.5. Exercise problems 


9.1. Generalize the reasoning of Sec. 1 to an arbitrary 1D map gn+i = (gn), with a function /(q) 
differentiable at all points of interest. In particular, derive the condition of stability of an N-point limit 


cycle g? > g? >... q™ 5 q™... 


9.2. Use the stability condition derived in the previous problem, to analyze the possibility of the 
deterministic chaos in the so-called tent map, with 


rl )={"7 for 0<q <1/2, 


withO<r<2. 
r(l-q), for 1/2<q<l, 


9.3. Find the conditions of existence and stability of fixed points of the so-called standard circle 
map: 


K . 
Gnu =I, + Q—-— sin 27q,, 
ait 
where g, are real numbers defined modulo 1 (i.e. with g, + 1 identified with g,), while Q and K are 


constant parameters. Discuss the relevance of the result for phase locking of self-oscillators — see, e.g., 
Sec. 5.4. 


23 An important milestone in that way was the work by S. Newhouse et al., Comm. Math. Phys. 64, 35 (1978), 
who proved the existence of a strange attractor in a rather abstract model of fluid flow. 

24 See, e.g., U. Frisch, Turbulence: The Legacy of A. N. Kolmogorov, Cambridge U. Press, 1996. 

25 The reader interested in the deterministic chaos as such may like to have a look at a very popular book by S. 
Strogatz, Nonlinear Dynamics and Chaos, Westview, 2001. 
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9.4. Find the conditions of existence and stability of fixed points of the so-called Hénon map:*® 


Dns =1-aq; + Dao 
Pia =6q,, with a,b>0. 


9.5. Is the deterministic chaos possible in our “testbed” problem shown in Fig. 2.1? What if an 
additional periodic external force is applied to the bead? Explain your answers. 


26 This map, first explored by M. Hénon in 1976 (for a particular set of constants a and b), has played an important 
historic role in the study of strange attractors. 
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Chapter 10. A Bit More of Analytical Mechanics 


This concluding chapter reviews two alternative approaches to analytical mechanics, whose major 
value is a closer parallel to quantum mechanics in general and its quasiclassical (WKB) approximation 
in particular. One of them, the Hamiltonian formalism, is also convenient for the derivation of an 
important asymptotic result, the adiabatic invariance, for classical systems with slowly changing 
parameters. 


10.1. Hamilton equations 


Throughout this course, we have seen how analytical mechanics, in its Lagrangian form, is 
invaluable for solving various particular problems of classical mechanics. Now let us discuss several 
alternative formulations! that may not be much more useful for this purpose, but shed additional light on 
possible extensions of classical mechanics, most importantly to quantum mechanics. 


As was already discussed in Sec. 2.3, the partial derivative p; = OL/Oq, participating in the 


Lagrange equation (2.19), 


0 OG, (10.1) 


dt 0q; 4, 
may be considered as the generalized momentum corresponding to the generalized coordinate g;, and the 
full set of these momenta may be used to define the Hamiltonian function (2.32): 


H=)> pj4,-L. (10.2) 
j 


Now let us rewrite the full differential of this function? in the following form: 


J 


dH = (doa, -1) = Yl 4, + p,d(q,)|-ab 
aL aL aL vee 


J 


According to the definition of the generalized momentum, the second terms of each sum over / in the 
last expression cancel each other, while according to the Lagrange equation (1), the derivative 0L/0q; is 
equal to p ,, so that 


al 
dH =—<-dt+ > (4,4p, - P,€q,). (10.4) 
J 


So far, this is just a universal identity. Now comes the main trick of Hamilton’s approach: let us 
consider H as a function of the following independent arguments: time ¢, the generalized coordinates q;, 


! Due to not only William Rowan Hamilton (1805-1865), but also Carl Gustav Jacob Jacobi (1804-1851). 
2 Actually, this differential was already spelled out (but partly and implicitly) in Sec. 2.3 — see Eqs. (2.33)-(2.35). 
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and the generalized momenta p; — rather than generalized velocities q, as in the Lagrangian formalism. 
With this new commitment, the general “chain rule” of differentiation of a function of several arguments 
gives 


(10.5) 
where dt, dq;, and dp; are independent differentials. Since Eq. (5) should be valid for any choice of these 


argument differentials, it should hold in particular if they correspond to the real law of motion, for 
which Eq. (4) is valid as well. The comparison of Eqs. (4) and (5) gives us three relations: 


(10.6) 
(10.7) 
Comparing the first of them with Eq. (2.35), we see that 
at eH. (10.8) 
dt Ot 


meaning that the function H(t, g;, p;) can change in time only via its explicit dependence on t. Two Eqs. 
(7) are even more substantial: provided that such function H(t, g;, pj) has been calculated, they give us 
two first-order differential equations (called the Hamilton equations) for the time evolution of the 
generalized coordinate and generalized momentum of each degree of freedom of the system. 


Let us have a look at these equations for the simplest case of a system with one degree of 
freedom, with the Lagrangian function (3.3): 
Mer 


5 gq aoe (q,t). 


Le 


(10.9) 


In this case, p = 0L/0q =m,q, andH = pq—-L=m,q? /2+U,,(q,t). To honor our new commitment, 
we need to express the Hamiltonian function explicitly via t, g, and p (rather than g ). From the above 
expression for p, we immediately have q = p/m,,; plugging this expression back to Eq. (9), we get 


2 


P 


He +U,,(q,¢). (10.10) 
2 ef 
Now we can spell out Eqs. (7) for this particular case: 
oe a (10.11) 
Op MN o¢ 
p= fe (10.12) 
oq oq 


3 Of course, the right-hand side of each equation (7) may include coordinates and momenta of other degrees of 
freedom as well, so that the equations of motion for different j are generally coupled. 
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While the first of these equations just repeats the definition of the generalized momentum 
corresponding to the coordinate g, the second one gives the equation of momentum’s change. 
Differentiating Eq. (11) over time, and plugging Eq. (12) into the result, we get: 

OU 
pe ee es (10.13) 
MN ¢ M o¢ Cq 


So, we have returned to the same equation (3.4) that had been derived from the Lagrangian approach.* 


Thus, the Hamiltonian formalism does not give much help for the solution of this problem — and 
indeed most problems of classical mechanics. (This is why its discussion had been postponed until the 
very end of this course.) Moreover, since the Hamiltonian function H(t, g;, pj) does not include 
generalized velocities explicitly, the phenomenological introduction of dissipation in this approach is 
less straightforward than that in the Lagrangian equations, whose precursor form (2.17) is valid for 
dissipative forces as well. However, the Hamilton equations (7), which treat the generalized coordinates 
and momenta in a manifestly symmetric way, are heuristically fruitful — besides being very appealing 
aesthetically. This is especially true in the cases where these arguments participate in H in a similar way. 
For example, in the very important case of a dissipation-free linear (“harmonic”) oscillator, for which 
Uer = Kerq’/2, Eq. (10) gives the symmetric form 


a 2 2 2.2 
Gia D aes _-P M 4g Oo X ; where @; oa (10.14) 
2m. 2 2m. 2 Me 


The Hamilton equations (7) for this system preserve that symmetry, especially evident if we introduce 
the normalized momentum / = p/m-r@p (already used in Secs. 5.6 and 9.2): 


dq dp 
—=@0,/, —=-@q. 10.15 
dt Dy at Mod ( ) 


More practically, the Hamilton approach gives additional tools for the search for the integrals of 
motion. To see that, let us consider the full time derivative of an arbitrary function f(t, gj, p;): 


of _ Ff, il, 
ae *3[Za+ as a} (10.16) 


Plugging in g, and p, from the Hamilton equations (7), we get 


(10.17) 


dt Ot at 


Y_ a, +5 of OH r).2 


Op; 0g; 4, Op, 


The last term on the right-hand side of this expression is the so-called Poisson bracket,> and is defined, 
for two arbitrary functions f(¢, g;, pj) and g(t, gj, pj), aS 


4 The reader is strongly encouraged to perform a similar check for a few more problems, for example those listed 
at the end of the chapter, to get a better feeling of how the Hamiltonian formalism works. 

5 Named after Siméon Denis Poisson (1781-1840), of the Poisson equation and the Poisson statistical distribution 
fame. 
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le ft | (10.18) 


From this definition, one can readily verify that besides evident relations {f, f} = 0 and {f g} =—{g, ft, 
the Poisson brackets obey the following important Jacobi identity: 


ig A}t+ (gs th Sh + the fo ght = 0. (10.19) 


Now let us use these relations for a search for integrals of motion. First, Eq. (17) shows that if a 
function fdoes not depend on time explicitly, and 


\H, f}=0, (10.20) 


then df/dt = 0, i.e. that function is an integral of motion. Moreover, it turns out that if we already know 
two integrals of motion, say fand g, then the following function, 


F=(f,gh, (10.21) 


is also an integral of motion — the so-called Poisson theorem. In order to prove it, we may use the Jacobi 
identity (19) with h = H. Next, using Eq. (17) to express the Poisson brackets {g, H}, {H, g}, and {H, {f, 
g}} = {H, F} via the full and partial time derivatives of the functions /, g, and F, we get 


Bh) Fe uF OF 9 (10.22) 
ot dt dt ot) dt a 
so that if fand g are indeed integrals of motion, 1.e., df/dt = dg/dt = 0, then 
dF OF of Og| OF of 2 
—._ = — + a — = _— are + — f 10.23 
dt at {8 | / al at {2 s| f at a 


Plugging Eq. (21) into the first term of the right-hand side of this equation, and differentiating it by 
parts, we get dF/dt = 0, i.e. F is indeed an integral of motion as well. 


Finally, one more important role of the Hamilton formalism is that it allows one to trace the 
close formal connection between classical and quantum mechanics. Indeed, using Eq. (18) to calculate 
the Poisson brackets of the generalized coordinates and momenta, we readily get 


19).9) $= 9, (Pj Py S=9, 19) PyS= Oy (10.24) 
In quantum mechanics, the operators of these variables (“observables”) obey commutation relations® 
[4,.4,]=0., 2,8) ]=0, [4,.b,]=ind,.. (10.25) 


where the definition of the commutator, B f =¢ - - a g, is to accertain extent’ similar to that (18) of 
the Poisson bracket. We see that the classical relations (24) are similar to the quantum-mechanical 
relations (25) if the following parallel has been made: 


6 See, e.g., QM Sec. 2.1. 
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fore. (10.26) 


This analogy extends well beyond Eqs. (24)-(25). For example, making the replacement (26) in 
Eq. (17), we get 
df 


Le. ih — 
dt 


cli ef +A a =n Laff Ail, (10.27) 
which is the correct equation of operator evolution in the Heisenberg picture of quantum mechanics. 
The parallel (26) may give important clues in the search for the proper quantum-mechanical operator of 


a given observable — which is not always elementary. 


10.2. Adiabatic invariance 


One more application of the Hamiltonian formalism in classical mechanics is the solution of the 
following problem.? Earlier in the course, we already studied some effects of time variation of 
parameters of a single oscillator (Sec. 5.5) and coupled oscillators (Sec. 6.5). However, those 
discussions were focused on the case when the parameter variation speed is comparable with the own 
oscillation frequency (or frequencies) of the system. Another practically important case is when some 
system’s parameter (let us call it 2) is changed much more slowly (adiabatically'®), 

é << a (10.28) 
A 7 


where 7 is a typical period of oscillations in the system. Let us consider a 1D system whose 


Hamiltonian H(q, p, 2) depends on time only via such a slow evolution of such parameter 2 = A(), and 
whose initial energy restricts the system’s motion to a finite coordinate interval — see, e.g., Fig. 3.2c. 


Then, as we know from Sec. 3.3, if the parameter / is constant, the system performs a periodic 
(though not necessarily sinusoidal) motion back and forth the q-axis, or, in a different language, along a 
closed trajectory on the phase plane [q, p] — see Fig. 1.!! According to Eq. (8), in this case, H is constant 
along the trajectory. (To distinguish this particular value of H from the Hamiltonian function as such, I 
will call it £, implying that this constant coincides with the full mechanical energy E — as does for the 
Hamiltonian (10), though this assumption is not necessary for the calculation made below.) 


The oscillation period 7 may be calculated as a contour integral along this closed trajectory: 


7 There is, of course, a conceptual difference between the “usual” products of the function derivatives 
participating in the Poisson brackets, and the operator “products” (meaning their sequential action on a state 
vector) forming the commutator. 

8 See, e.g., QM Sec. 4.6. 

9 Various aspects of this problem and its quantum-mechanical extensions were first discussed by L. Le Cornu 
(1895), Lord Rayleigh (1902), H. Lorentz (1911), P. Ehrenfest (1916), and M. Born and V. Fock (1928). 

10 This term is also used in thermodynamics and statistical mechanics, where it implies not only a slow parameter 
variation (if any) but also thermal insulation of the system — see, e.g., SM Sec. 1.3. Evidently, the latter condition 
is irrelevant in our current context. 

'1 As a reminder, we discussed such phase-plane representations in Chapter 5 — see, e.g., Figs. 5.5, 5.9, and 5.16. 
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E 
dt 1 
7 = | dt = >—dq = >—dq. (10.29) 
dq j q 
Using the first of the Hamilton equations (7), we may represent this integral as 
7 =§——ag. (10.30) 
OH / Op 


At each given point g, H = E is a function of p alone, so that we may flip the partial derivative in the 
denominator just as the full derivative, and rewrite Eq. (30) as 


0 
ea pao. (10.31) 


For the particular Hamiltonian (10), this relation is immediately reduced to Eq. (3.27), now in the form 


of a contour integral: 
1/2 
m 1 
T= s | dq. (10.32) 
[ 2 tear 


A(p,g,A)=E'>E 


(p,q, A)=E 


~ 


H 


Fig. 10.1. Phase-plane representation of periodic 
oscillations of a 1D Hamiltonian system, for two 
values of energy (schematically). 


Naively, it may look that these formulas may be also used to find the motion period’s change 
when the parameter 4 is being changed adiabatically, for example, by plugging the given functions 
mA) and Uedg, 2) into Eq. (32). However, there is no guarantee that the energy F in that integral 
would stay constant as the parameter changes, and indeed we will see below that this is not necessarily 
the case. Even more interestingly, in the most important case of the harmonic oscillator (Usp = Kesq’/2), 
whose oscillation period 7 does not depend on E (see Eq. (3.29) and its discussion), its variation in the 
adiabatic limit (28) may be readily predicted: 7(A) = 2/a(A) = 2a me A)/Ke(A)]', but the dependence 
of the oscillation energy E (and hence of the oscillation amplitude) on / is not immediately obvious. 

In order to address this issue, let us use Eq. (8) (with E = H) to represent the rate of the energy 
change with A(#), i.e. in time, as 

dE OH OH dA 
dt Ot OA dt 


(10.33) 
Since we are interested in a very slow (adiabatic) time evolution of energy, we can average Eq. (33) 


over fast oscillations in the system, for example over one oscillation period 7 treating dA/dt as a 
constant during this averaging. (This is the most critical point of this argumentation, because at any non- 
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vanishing rate of parameter change the oscillations are, strictly speaking, non-periodic.!?) The averaging 
yields 


(10.34) 


dE dAoH dal es 
dt dt 0A dt7+ 0A 


Transforming this time integral to the contour one, just as we did at the transition from Eq. (29) to Eq. 
(30), and then using Eq. (31) for 7, we get 


gaHiloa 
dE da! dH lop - 
= (10.35) 
dt dt f Pag 
aE 


At each point q of the contour, 7 is a function of not only 4, but also of p, which may be also /- 
dependent, so that if F is fixed, the partial differentiation of the relation E = H over / yields 


OH oH a _. ,, OHI0A__@ 


— =O, 1.6: (10.36) 
OA Op oa OH / ap OA” 
Plugging the last relation to Eq.(35), we get 
op 
—_ ei dq 
ae la ; (10.37) 
dt dt pz = ite 


Since the left-hand side of Eq. (37) and the derivative d//dt do not depend on g, we may move them into 
the integrals over g as constants, and rewrite Eq. (37) as 


iE ae ae = 0, (10.38) 


OE dt OA dt 


Now let us consider the following integral over the same phase-plane contour, 


1 
J =—6pdq, 10.39 
she q (10.39) 


called the action variable. Just to understand its physical sense, let us calculate J for a harmonic 
oscillator (14). As we know very well from Chapter 5, for such an oscillator, g = Acos¥, p = — 
Mep@Asin¥ (with Y = @pt + const), so that J may be easily expressed either via the oscillations’ 
amplitude A, or via their energy E = H = meray A7/2: 


1 ars a) E 
J =5—} pdg == im m,,@,Asin ¥ )d(Acos¥)= as tA = i (10.40) 


12 Because of the implied nature of this conjecture (which is very close to the assumptions made at the derivation 
of the reduced equations in Sec. 5.3), new, more strict (but also much more cumbersome) proofs of the final Eq. 
(42) are still being offered in literature — see, e.g., C. Wells and S. Siklos, Eur. J. Phys. 28, 105 (2007) and/or A. 
Lobo et al., Eur. J. Phys. 33, 1063 (2012). 
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Returning to a general system with adiabatically changed parameter A, let us use the definition 
of J, Eq. (39), to calculate its time derivative, again taking into account that at each point q of the 
trajectory, p is a function of E and A: 


dJ _ 1 f Pag = : (2 secle au (10.41) 
at 2ntadt 2nt\@E dt OA dt 


Within the accuracy of our approximation, in which the contour integrals (38) and (41) are calculated 
along a closed trajectory, the factor dE/dt is indistinguishable from its time average, and these integrals 
coincide so that the result (38) is applicable to Eq. (41) as well. Hence, we have finally arrived at a very 
important result: at a slow parameter variation, d//dt = 0, i.e. the action variable remains constant: 


This is the famous adiabatic invariance." In particular, according to Eq. (40), in a harmonic oscillator, 
the energy of oscillations changes proportionately to its own (slowly changed) frequency. 


Before moving on, let me briefly note that the adiabatic invariance is not the only application of 
the action variable J. Since the initial choice of generalized coordinates and velocities (and hence the 
generalized momenta) in analytical mechanics is arbitrary (see Sec. 2.1), it is almost evident that J may 
be taken for a new generalized momentum corresponding to a certain new generalized coordinate ©,!4 
and that the pair {/, ©} should satisfy the Hamilton equations (7), in particular, 

dQ OH 

dt QJ 
Following the commitment of Sec. 1 (made there for the “old” arguments gq, p,), before the 
differentiation on the right-hand side of Eq. (43), H should be expressed as a function (besides f) of the 
“new” arguments J and ©. For time-independent Hamiltonian systems, H is uniquely defined by J — see, 


e.g., Eq. (40). Hence in this case the right-hand side of Eq. (43) does not depend on either ¢ or ©, so 
according to that equation, © (called the angle variable) is a linear function of time: 


(10.43) 


© TE sd cares, (10.44) 
oJ 


For a harmonic oscillator, according to Eq. (40), the derivative 0H/0J = OE/OJ is just @ = 22717, 
so that © = wt + const, 1.e. it is just the full phase that was repeatedly used in this course — especially 
in Chapter 5. It may be shown that a more general form of this relation, 

eed 


a= (10.45) 


'3 For certain particular oscillators, e.g., a point pendulum, Eq. (42) may be also proved directly — an exercise 
highly recommended to the reader. 

14 This, again, is a plausible argument but not a strict proof. Indeed: though, according to its definition (39), J is 
nothing more than a sum of several (formally, the infinite number of) values of the momentum p, they are not 
independent, but have to be selected on the same closed trajectory on the phase plane. For more mathematical 
vigor, the reader is referred to Sec. 45 of Mechanics by Landau and Lifshitz (which was repeatedly cited above), 
which discusses the general rules of the so-called canonical transformations from one set of Hamiltonian 
arguments to another one — say from {p, g} to {J, O}. 
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is valid for an arbitrary system described by Eq. (10). Thus, Eq. (44) becomes 


@= Ine + eanse. (10.46) 


This means that for an arbitrary (nonlinear) 1D oscillator, the angle variable © is a convenient 
generalization of the full phase ‘¥. Due to this reason, the variables J and © present a convenient tool for 
discussion of certain fine points of the dynamics of strongly nonlinear oscillators — for whose discussion 
I, unfortunately, do not have time/space.!> 


10.3. The Hamilton principle 


Now let me show that the Lagrange equations of motion, that were derived in Sec. 2.1 from the 
Newton laws, may be also obtained from the so-called Hamilton principle,'® namely the condition of a 
minimum (or rather an extremum) of the following integral called action: 


(10.47) 


where fini and ¢fn are, respectively, the initial and final moments of time, at which all generalized 
coordinates and velocities are considered fixed (not varied) — see Fig. 2. 


-~ Virtual 
motion 


Fig. 10.2. Deriving the Hamilton 
principle. 


t 


ini bein t 


The proof of that statement is rather simple. Considering, similarly to Sec. 2.1, a possible virtual 
variation of the motion, described by infinitesimal deviations { 6g ;(¢), oq ;(t) } from the real motion, the 


necessary condition for S to be minimal is 


(10.48) 


where 6S and OL are the variations of the action and the Lagrange function, corresponding to the set 
{ 6g (t), dq ;(t) }. As has been already discussed in Sec. 2.1, we can use the operation of variation just 


'5 An interested reader may be referred, for example, to Chapter 6 in J. Jose and E. Saletan, Classical Dynamics, 
Cambridge U. Press, 1998. 

16 Tt is also called the “principle of least action”. (This name may be fairer in the context of a long history of the 
development of the principle, starting from its simpler particular forms, which includes the names of P. de Fermat, 
P. Maupertuis, L. Euler, and J.-L. Lagrange.) 
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as the usual differentiation (but at a fixed time, see Fig. 2), swapping these two operations if needed — 
see Fig. 2.3 and its discussion. Thus, we may write 


a-5[%5,+%a,|-5% 
After plugging the last expression into Eq. (48), we can integrate the second term by parts: 


lin Len 
oL d 
So) el a 


J 


Ly ge (10.49) 


0q, a 0q, at 


Lin ‘in Cen OL 
= PD = ae asf 4, -> | a4 2-0 (10.50) 
qj tae 2 Fini og, 
Since the generalized coordinates in the initial and final points are considered fixed (not affected 
by the variation), all dg(tini) and Og((tin) vanish, so that the second term in the last form of Eq. (50) 
vanishes as well. Now multiplying and dividing the last term of that expression by dt, we finally get 


oS = zz ae Jag e a =- fr|{2]-2 pao (10.51) 


This relation should hold for an arbitrary set of functions 6g¢t), and for any time interval, and this is 
only possible if the expressions in the square brackets equal zero for all 7, giving us the set of the 
Lagrange equations (2.19). So, the Hamilton principle indeed gives the Lagrange equations of motion. 


It is fascinating to see how the Hamilton principle works for particular cases. As a very simple 
example, let us consider the usual 1D linear oscillator, with the Lagrangian function used so many times 
before in this course: 


2 
mM.» MO, » 
10.52 
a4 ra ( ) 


As we know very well, the Lagrange equations of motion for this L are exactly satisfied by any 
sinusoidal function with the frequency @p, in particular by a symmetric function of time 


q,(t)= Acos at, so that ¢,(t)=-Aa sina,t . (10.53) 


On a limited time interval, say 0 < @ft < +2/2, this function is rather smooth and may be well 
approximated by another simple, reasonably selected functions of time, for example 


q.(t)= A(I-ar), so that g,(t)=-24Ar, (10.54) 


provided that the parameter 4 is also selected reasonably. Let us take 2 = (a/2a@)°, so that the 
approximate function q,(¢) coincides with the exact function g,(t) at both ends of our time interval 


(Fig.3): 
qa (ts ) = qel(t = A, qa (aa q. (a ) = 0, where lini = 0, bein = 2 (10.55) 
Q, 


0 


and check which of them the Hamilton principle “prefers”, 1.e. which function gives the least action. 
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Fig. 10.3. Plots of the functions 
q(t) given by Eqs. (53) and (54). 


0 0.1 0.2 0.3 0.4 
O,t/a 


An elementary calculation of the action (47) corresponding to these two functions, yields 


se [2-2 |moya’ =0, S,= ($2 maa ~ (0.4244-0.4189)ma,A? >0, (10.56) 
with the first terms in all the parentheses coming from the time integrals of the kinetic energy, and the 
second terms, from those of the potential energy. 


This result shows, first, that the exact function of time, for which these two contributions exactly 
cancel,!’ is indeed “preferable” for minimizing the action. Second, for the approximate function, the two 
contributions to the action are rather close to the exact ones, and hence almost cancel each other, 
signaling that this approximation is very reasonable. It is evident that in some cases when the exact 
analytical solution of the equations of motion cannot be found, the minimization of S by adjusting one or 
more free parameters, incorporated into a guessed “trial” function, may be used to find a reasonable 
approximation for the actual law of motion.!8 


It is also very useful to make the notion of action S, defined by Eq. (47), more transparent by 
calculating it for the simple case of a single particle moving in a potential field that conserves its energy 
E=T+U. In this case, the Lagrangian function L = T— U may be represented as 


L=T-U =2T -(1+U)=2T -E=mv’ -E, (10.57) 
with a time-independent E, so that 


s = | Lat = | mv*dt— Et + const. (10.58) 


Recasting the expression under the remaining integral as mv-vdt = p-(dr/dt)dt = p-dr, we finally get 


S= [p-dr— Et + const = S, —£t+ const, (10.59) 


'7 Such cancellation, i.e. the equality S = 0, is of course not the general requirement; it is specific only for this 
particular example, with a specific choice of the arbitrary constant in the potential energy of the system. 
18 This is essentially a classical analog of the variational method of quantum mechanics — see, e.g., QM Sec. 2.9. 
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where the time-independent integral 
S, =[p-dr (10.60) 


is frequently called the abbreviated action.'9 


This expression may be used to establish one more important connection between the classical 
and quantum mechanics — now in its Schrédinger picture. Indeed, in the quasiclassical (WKB) 
approximation of that picture? a particle of fixed energy E is described by a de Broglie wave 


Yir,c) x exp|i({k-dr — ot + const} (10.61) 


where the wave vector k is proportional to the particle’s momentum (which is possibly a slow function 
of r) and the frequency @, to its energy: 


beh ge (10.62) 


Plugging these expressions into Eq. (61) and comparing the result with Eq. (59), we see that the WKB 
wavefunction may be represented as 


W oc exp{iS /h}. (10.63) 


Hence the Hamilton principle (48) means that the total phase of the quasiclassical wavefunction 
should be minimal along the particle’s real trajectory. But this is exactly the so-called eikonal minimum 
principle well known from the optics (though it is valid for any other waves as well), where it serves to 
define the ray paths in the geometric optics limit — similar to the WKB approximation. Thus, the ratio 
S/h may be considered just as the eikonal, i.e. the total phase accumulation, of the de Broglie waves.?! 


Now, comparing Eq. (60) with Eq. (39), we see that the action variable J is just the change of the 
abbreviated action Sp along a single phase-plane contour, divided by 2. This means, in particular, that 
in the WKB approximation, J is the number of de Broglie waves along the classical trajectory of a 
particle, i.e. an integer value of the corresponding quantum number. If the system’s parameters are 
changed slowly, the quantum number has to stay integer, and hence J cannot change, giving a quantum- 
mechanical interpretation of the adiabatic invariance. The reader should agree that this is really 
fascinating: a fact of classical mechanics may be “derived” (or at least understood) more easily from the 
quantum mechanics’ standpoint. (As a reminder, we have run into a similarly pleasant surprise at our 
discussion of the non-degenerate parametric excitation in Sec. 6.7.) 


!9 Comparing Eq. (59) with the Hamilton principle (48), we see that if the variational trajectories are limited to 
those of only one (actual) energy E, the real motion corresponds to the minimum of not only S but Sp as well. This 
fact is called the Maupertuis principle. (Historically, this result rather than Eq. (48), was called the “principle of 
least action”, and some authors still use this terminology, so the reader’s caution is advised.) 

20 See, e.g., QM Sec. 3.1. 

21 Indeed, Eq. (63) was the starting point for R. Feynman’s development of his path-integral formulation of 
quantum mechanics — see, e.g., QM Sec. 5.3. 


Chapter 10 Page 12 of 16 


Hamilton- 
Jacobi 
action 


Hamilton- 
Jacobi 
equation 


Essential Graduate Physics CM: Classical Mechanics 


10.4. The Hamilton-Jacobi equation 


The action S, defined by Eq. (47), may be used for one more analytical formulation of classical 
mechanics. For that, we need to make one more, different commitment: S has to be considered as a 
function of the following independent arguments: the final time point tg, (which I will, for brevity, 
denote as ¢ in this section), and the set of generalized coordinates (but not of the generalized velocities!) 
at that point: 


S= [at = sit.¢,(0]. (10.64) 


lini 


Let us calculate the variation of this (from the variational point of view, new!) function, resulting 
from an arbitrary combination of variations of the final values g(t) of the coordinates while keeping ¢ 
fixed. Formally this may be done by repeating the variational calculations described by Eqs. (49)-(51), 
besides that now the variations oq; at the finite point () do not necessarily equal zero. As a result, we get 


OL r d({ OL \ OL 
o-DZm),-) ax) 4 pe (10.65) 


04; 0q 


For the motion along the real trajectory, 1.e. satisfying the Lagrange equations (2.19), the second term of 
this expression equals zero. Hence Eq. (65) shows that, for (any) fixed time ¢, 


iL 
ce = ae (10.66) 
0g; 04; 
But the last derivative is nothing else than the generalized momentum p,, so that 
oe Dia (10.67) 
0q, 


(As a reminder, both parts of this relation refer to the final moment ¢ of the trajectory.) As a result, the 
full derivative of the action S[¢, g(¢)] over time takes the form 


dS os os os 
ean ie oe O.. 10.68 
dt ot Lay 41 e Lid, a 
Now, by the definition of S, the full derivative dS/dt is nothing more than the Lagrangian 
function L, so that Eq. (67) yields 
ere (10.69) 
However, according to the definition (2) of the Hamiltonian function H, the right-hand side of Eq. (69) 
is just (-H), and we get an extremely simply-looking Hamilton-Jacobi equation 


2S. 
Ot 
This simplicity is, however, rather deceiving, because to use this equation for the calculation of 


the function S(t, g;) for any particular problem, the Hamiltonian function has to be first expressed as a 
function of time ¢, generalized coordinates g;, and the generalized momenta p; (which may be, according 


H. (10.70) 
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to Eq. (67), represented just as the derivatives 0S/dq;). Let us see how this procedure works for the 
simplest case of a 1D system with the Hamiltonian function given by Eq. (10). In this case, the only 
generalized momentum is p = 0S/0q, so that 


Dp 1 (as) 
= $Y (9.t) == — zo. +U,,(9,t), (10.71) 
2M. 2M. 0q 
and Eq. (70) is reduced to the following partial differential equation, 
2 
we +U< (9,0 =0. (10.72) 
Ot 2m, \ oq 


Its solution may be readily found in the easiest case of time-independent potential energy Uer = 
Uer (q). In this case, Eq. (72) is evidently satisfied by the following variable-separated solution: 


S(t,q) = S)(q) + const xt. (10.73) 


Plugging this solution into Eq. (72), we see that since the sum of the two last terms on the left-hand side 
of that equation represents the full mechanical energy £, the constant in Eq. (73) is nothing but (—£). 
Thus for the function So(q) we get an ordinary differential equation 


2 
es (2) +U.,(q) = 9. (10.74) 
Integrating it, we get 
Sy = [2m [E-U..(Q)}}' dq + const, (10.75) 
so that, finally, the action is equal to 
S = | {2m,,[E-U.,(q)]}"" dq — Et + const. (10.76) 


For the case of 1D motion of a single 1D particle, i.e. for g = x, mer = m, Ue(q) = U(x), this solution is 
just the 1D case of the more general Eqs. (59)-(60), which were obtained above in a much more simple 
way. (In particular, So is just the abbreviated action.) 


This particular example illustrates that the Hamilton-Jacobi equation is not the most efficient 
way for the solution of most practical problems of classical mechanics. However, it may be rather useful 
for studies of certain mathematical aspects of dynamics.?* Moreover, in the early 1950s this approach 
was extended to a completely different field — the optimal control theory, in which the role of the action 
Sis played by the so-called cost function — a certain functional of a system (understood in a very general 
sense of this term), that should be minimized by an optimal choice of a control signal — a function of 
time that affects the system’s evolution in time. From the point of view of this theory, Eq. (70) is a 
particular case of a more general Hamilton-Jacobi-Bellman equation.” 


22 See, e.g., Chapters 6-9 in I. C. Percival and D. Richards, Introduction to Dynamics, Cambridge U. Press, 1983. 
23 See, e.g., T. Bertsekas, Dynamic Programming and Optimal Control, vols. 1 and 2, Aetna Scientific, 2005 and 
2007. The reader should not be intimidated by the very unnatural term “dynamic programming”, which was 
invented by the founding father of this field, Richard Bellman, to lure government bureaucrats into funding his 
research, deemed too theoretical at that time. (Presently, it has a broad range of important applications.) 
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10.5. Exercise problems 


In each of Problems 10.1-10.3, for the given system: 


(i) derive the Hamilton equations of motion, and 
(11) check whether these equations are equivalent to those derived from the Lagrangian 
formalism. 


10.1. Our “testbed” system: a bead on a ring, being rotated with a fixed angular 
velocity w about its vertical diameter — see Fig. 2.1, reproduced on the right. 


10.2. The system considered in Problem 2.3: a pendulum hanging froma horizontal | 
support whose motion law x0(f) is fixed — see the figure on the right. (No vertical-plane 


constraint. ) g 


mass m’. The wedge is free to move, also without friction, along a horizontal 
surface — see the figure on the right. (Both motions are within the vertical 
plane containing the steepest slope line.) 


10.3. The system considered in Problem 2.8: a block of mass m that Gir 
can slide, without friction, along the inclined surface of a heavy wedge of Ep, 
SYN 


10.4. Derive and solve the equations of motion of a particle with the following Hamiltonian 
function: 


He aay: 
2m 


where a is a constant scalar. 


10.5. Let Z be the Lagrangian function, and H the Hamiltonian function, of the same system. 
What three of the following four statements, 


. al . OL bse Wd. _. Of 
1) —=0, il) —=0, il) —=0, iv) —=0, 
On a? Ok Ms 
are equivalent? Give an example when those three equalities hold, but the fourth one does not. 
10.6. Calculate the Poisson brackets of a Cartesian component of the angular momentum L of a 


particle moving in a central force field and its Hamiltonian function H, and discuss the most evident 
implication of the result. 
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10.7. After small oscillations had been initiated in the point pendulum shown in 
Fig. on the right, the supporting string is being pulled up slowly, so that the pendulum’s “ 
length / is being reduced. Neglecting dissipation, LN 


(1) prove by a direct calculation that the oscillation energy is indeed changing I(t) 
proportionately to the oscillation frequency, as it follows from the constancy of the 
corresponding adiabatic invariant (40); and g | 

(11) find the /-dependence of the amplitudes of the angular and linear deviations 
from the equilibrium. 


m 


10.8. The mass m of a small body that performs 1D oscillations in the potential well U(x) = ax”, 
with n > 0, is being changed slowly. Calculate the oscillation energy E as a function of m. 


10.9. A stiff ball is bouncing vertically from the floor of an elevator whose upward acceleration 
changes very slowly. Neglecting the energy dissipation, calculate how much the bounce height h 
changes during the acceleration’s increase from 0 to g. Is your result valid for an equal but abrupt 


increase of the elevator’s acceleration? 


10.10." A 1D particle of a constant mass m moves in a time-dependent potential U(q, t) = 
mo (t)q°/2, where aX#) is a slow function of time, with | @| << @*. Develop the approximate method for 


the solution of the corresponding equation of motion, similar to the WKB approximation used in 
quantum mechanics.*4 Use the approximation to confirm the conservation of the action variable (40) for 
this system. 


Hint: You may like to look for the solution to the equation of motion in the form 
g(t) = exp\A()+i¥(}, 


where A and Y are some real functions of time, and then make proper approximations in the resulting 
equations for these functions. 


24 See, e.g., QM Sec. 2.4. 
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Chapter 1. Electric Charge Interaction 


This chapter reviews the basics of electrostatics — the description of interactions between stationary (or 
relatively slowly moving) electric charges. Much of this material should be known to the reader from 
their undergraduate studies,;! because of that, the explanations are very brief. 


1.1. The Coulomb law 


A quantitative discussion of classical electrodynamics, starting from the electrostatics, requires 
common agreement on the meaning of the following notions: 


- electric charges q,, aS revealed, most explicitly, by observation of electrostatic interaction 
between the charged particles; 

- point charges — the charged particles so small that their position in space, for the given 
problem, may be completely described (in the given reference frame) by their radius-vectors r,; and 

- electric charge conservation — the fact that the algebraic sum of all charges g, inside any 
closed volume is conserved unless the charged particles cross the volume’s border. 


I will assume that these notions are well known to the reader. Using them, the Coulomb law? for 
the interaction of two stationary point charges may be formulated as follows: 


(1.1) 


where Fix denotes the electrostatic (Coulomb) force exerted on the charge number k by the charge 
number k’, separated from it by distance Rj,’ — see Fig. 1. 


Fig. 1.1. Coulomb force directions (for the case gig, > 0). 


! For remedial reading, I can recommend, for example, D. Griffiths, Introduction to Electrodynamics, 4" ed., 
Pearson, 2015. 

2 On top of the more general notions of the classical Newtonian space, point particles and forces, as used in 
classical mechanics — see, e.g., CM Sec. 1.1. 

3 Formulated in 1785 by Charles-Augustin de Coulomb, on basis of his earlier experiments, in turn rooting in 
prior studies of electrostatic phenomena, with notable contributions by William Gilbert, Otto von Guericke, 
Charles Francois de Cisternay Du Fay, Benjamin Franklin, and Henry Cavendish. 
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I am confident that this law is very familiar to the reader, but a few comments may still be due: 


(i) Flipping the indices k and k’, we see that Eq. (1) complies with the 3 Newton law: the 
reciprocal force is equal in magnitude but opposite in direction: Fy. = —Fix. 


(ii) Since the vector Ry’ = rx — rz, by its definition, is directed from point r; toward point r; 
(Fig. 1), Eq. (1) correctly describes the experimental fact that charges of the same sign (i.e. with qigx’ > 
0) repulse, while those with opposite signs (gigx’ < 0) attract each other. 


(iii) In some textbooks, the Coulomb law (1) is given with the qualifier “in free space” or “in 
vacuum”. However, actually, Eq. (1) remains valid even in the presence of any other charges — for 
example, of internal charges in a quasi-continuous medium that may surround the two charges (number 
k and k’) under consideration. The confusion stems from the fact, to be discussed in detail in Chapter 3 
below, that in some cases it is convenient to formally represent the effect of the other charges as an 
effective (rather than actual!) modification of the Coulomb law. 


(iv) The constant « in Eq. (1) depends on the system of units we use. In the Gaussian units, x is 
set to 1, for the price of introducing a special unit of charge (the statcoulomb) that would make 
experimental data compatible with Eq. (1) if the force F;,:is measured in the Gaussian units (dynes). On 
the other hand, in the International System (“SI”) of units, the charge’s unit is one coulomb 
(abbreviated C), and «is different from 1: 


1 


i= ’ 
AT, 


(1.2) 


|. 


where & © 8.854107!” is called the electric constant.4 


Unfortunately, the continuing struggle between zealous proponents of these two systems of units 
bears all not-so-nice features of a religious war, with a similarly slim chance for any side to win it in any 
foreseeable future. In my humble view, each of these systems has its advantages and handicaps (to be 
noted on several occasions below), and every educated physicist should have no problem with using any 
of them. Following insisting recommendations of international scientific unions, I am using the SI units 
throughout my series. However, for the readers’ convenience, in this course (where the difference 
between the Gaussian and SI systems is especially significant) I will write the most important formulas 
with the constant (2) clearly displayed — for example, Eq. (1) as 


a. 1.3 
kk Ane, Gk. ir, —r, (1.3) 


so that the transfer to the Gaussian units may be performed just by the formal replacement 47 — 1. (In 
the rare cases when the transfer is not obvious, I will duplicate formulas in the Gaussian units.) 


Besides Eq. (3), another key experimental law of electrostatics is the /inear superposition 
principle: the electrostatic forces exerted on some point charge (say, gx) by other charges add up as 
vectors, forming the net force 


4 Since 2018, one coulomb is defined, in the “legal” metrology, as a certain, exactly fixed number of the 
fundamental electric charges e, and the “legal” SI value of & is not more exactly equal to 10’/4c” (where c is the 
speed of light) as it was before that, but remains extremely close to that fraction, with the relative difference of the 
order of 10°” — see appendix CA: Selected Physical Constants. In this series, this minute difference is ignored. 
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F, = Fw: (1.4) 
k'#k 
where the summation is extended over all charges but q;, and the partial force Fj,’ is described by Eq. 
(3). The fact that the sum is restricted to k’ # k means that a point charge, in statics, does not interact 
with itself. This fact may look obvious from Eq. (3), whose right-hand side diverges at r; > r,, but 
becomes less evident (though still true) in quantum mechanics — where the charge of even an elementary 
particle is effectively spread around some volume, together with the particle’s wavefunction.> 


Now we may combine Eqs. (3) and (4) to get the following expression for the net force F acting 
on a probe charge q located at point r: 


F(r)=q 1 r-r, 


(1.5) 


Co ——— 
ATE) rer | r- r,| 


This equality implies that it makes sense to introduce the notion of the electric field (as an entity 
independent of g), whose distribution in space is characterized by the following vector: 


p(r)= Ee), (1.6) 


formally called the electric field strength — but much more frequently, just the “electric field”. In these 
terms, Eq. (5) becomes 
r—-r, 


qi 


E(r) = —— 
(r) ATE) r,.7r |r-r,.|’ 


(1.7) 


Just convenient is electrostatics, the notion of the field becomes virtually unavoidable for the description 
of time-dependent phenomena (such as electromagnetic waves, see Chapter 7 and on), where the 
electromagnetic field shows up as a specific form of matter, different from the usual “material” particles 
— even though quantum electrodynamics (to be reviewed in QM Chapter 9) offers their joint description. 


Many real-world problems involve multiple point charges located so closely that it is possible to 
approximate them with a continuous charge distribution. Indeed, let us consider a group of many (dN >> 
1) close charges, located at points rj, all within an elementary volume d’r’. For relatively distant field 
observation points, with |r - r;| >> dr’, the geometrical factor in the corresponding terms of Eq. (7) is 
essentially the same. As a result, these charges may be treated as a single elementary charge dO(r’). 
Since at dN >> 1, this elementary charge is proportional to the elementary volume a’r’, we can define 
the local 3D charge density p(r’) by the following relation: 


ple)d?r'=d0e')= Yay. (1.8) 
r,ed?r' 


and rewrite Eq. (7) as an integral (over the volume containing all essential charges): 


’ 


1 
Fa Jee) — a'r" (1.9) 
ME y 


r=F 
r-r| 


5 Note that some widely used approximations, e.g., the density functional theory (DFT) of multiparticle systems, 
essentially violate this law, thus limiting their accuracy and applicability — see, e.g., QM Sec. 8.4. 
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Note that for a continuous, smooth charge density p(r’), the integral in Eq. (9) does not diverge at R = 
r—r’ — 0, because in this limit, the fraction under the integral increases as R”, i.e. slower than the 
decrease of the elementary volume d’r’, proportional to R°. 


Let me emphasize the dual use of Eq. (9). In the case when p(r) is a continuous function 
representing the average charge defined by Eq. (8), Eq. (9) is not valid at distances |r r,| of the order 
of the distance between the adjacent point charges, i.e. does not describe rapid variations of the electric 
field at these distances. Such approximate, smoothly changing field E(r), is called macroscopic; we will 
repeatedly return to this notion in the following chapters. On the other hand, Eq. (9) may be also used 
for the description of the exact (frequently called microscopic) field of discrete point charges, by 
employing the notion of Dirac’s delta function, which is the mathematical description of a very sharp 
function equal to zero everywhere but one point, and still having a finite integral (equal to 1).° Indeed, in 
this formalism, a set of point charges gq, located in points r;; may be represented by the pseudo- 
continuous density 


ple’) = qe 5(e'— ry) (1.10) 


Plugging this expression into Eq. (9), we return to its exact, discrete version (7). In this sense, Eq. (9) is 
exact, and we may use it as the general expression for the electric field. 


1.2. The Gauss law 


Due to the extension of Eq. (9) to point (“discrete”) charges, it may seem that we do not need 
anything besides it for solving any problem of electrostatics. In practice, however, this is not quite true — 
first of all, because the direct use of Eq. (9) frequently leads to complex calculations. Indeed, let us try 
to solve a problem that is conceptually very simple: find the electric field induced by a spherically- 
symmetric charge distribution with density p(r’) — see Fig. 2. 


Fig. 1.2. One of the simplest problems of 
electrostatics: the electric field produced by 
a spherically-symmetric charge distribution. 


We may immediately use the problem’s symmetry to argue that the electric field should be also 
spherically-symmetric, with only one component in the spherical coordinates: E(r)= E(r)n,, where n, = 
r/r is the unit vector in the direction of the field observation point r. Taking this direction for the polar 


6 See, e.g., MA Sec. 14. The 2D (areal) charge density o and the 1D (linear) density 2 may be defined absolutely 
similarly to the 3D (volumic) density p: dQ = od’r, dQ = Adr. Note that the approximations in that either o # 0 
or A # 0 imply that p is formally infinite at the charge location; for example, the model in that a plane z = 0 is 
charged with areal density o# 0, means that p= o{z), where Xz) 1s Dirac’s delta function. 
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axis of a spherical coordinate system, we can use the evident axial symmetry of the system to reduce Eq. 
(9) to 


F= 


1 Ox t : 
2n| sin 6'd0" [rrdr PO cose, (1.11) 
an 0 R 

where @, 0’, and R are the geometrical parameters marked in Fig. 2. Since 6 and R may be readily 
expressed via r’ and 6’, using the auxiliary parameters a and h, 


cos =", R’ =h’ +(r—-r'cos6)’, where a=r'cos@, h=r'sin@g, (1.12) 


Eq. (11) may be eventually reduced to an explicit integral over r’ and 6’, and worked out analytically, 
but that would require some effort. 


For other problems, the integral (9) may be much more complicated, defying an analytical 
solution. One could argue that with the present-day abundance of computers and numerical algorithm 
libraries, one can always resort to numerical integration. This argument may be enhanced by the fact 
that numerical integration is based on the replacement of the required integral by a discrete sum, and the 
summation is much more robust to the (unavoidable) rounding errors than the finite-difference schemes 
typical for the numerical solution of differential equations. These arguments, however, are only partly 
justified, since in many cases the numerical approach runs into a problem sometimes called the curse of 
dimensionality — the exponential dependence of the number of needed calculations on the number of 
independent parameters of the problem.’ Thus, despite the proliferation of numerical methods in 
physics, analytical results have an everlasting value, and we should try to get them whenever we can. 
For our current problem of finding the electric field generated by a fixed set of electric charges, large 
help may come from the so-called Gauss law. 


To derive it, let us consider a single point charge qg inside a smooth closed surface S (Fig. 3), and 
calculate the product E,d’r, where a’r is an elementary area of the surface (which may be well 
approximated with a plane fragment of that area), and £,, = E-n is the component of the electric field at 
that point, normal to the plane. 


Fig. 1.3. Deriving the Gauss law: a point charge g (a) inside the volume JV, and (b) outside of that volume. 


7 For a more detailed discussion of this problem, see, e.g., CM Sec. 5.8. 
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This component may be calculated as Ecos@, where @ is the angle between the vector E and the 
unit vector n normal to the surface. Now let us notice that the product cos@d’r is nothing more than the 
area d’r’ of the projection of d’r onto the plane normal to the vector r connecting the charge qg with the 
considered point of the surface (Fig. 3), because the angle between the elementary areas d’r’ and d’r is 
also equal to @ Using the Coulomb law for E, we get 


E,d°r = EcosOd°r = — aa (1.13) 
ME, 


But the ratio d’r’/r’ is nothing more than the elementary solid angle dQ under which the areas d’r’ and 
d’r are seen from the charge point, so that E,d’r may be represented just as a product of dQ by a 
constant (q/4 zé). Summing these products over the whole surface, we get 


f£,d°*r=—_fdn=*, (1.14) 
: ATE) 5 Bi 


since the full solid angle equals 47. (The integral on the left-hand side of this relation is called the flux 
of electric field through the surface S.) 


Relation (14) expresses the Gauss law for one point charge. However, it is only valid if the 
charge is located inside the volume V limited by the surface S. To find the flux created by a charge 
located outside of this volume, we still can use Eq. (13), but have to be careful with the signs of the 
elementary contributions £,dA. Let us use the common convention to direct the unit vector n out of the 
closed volume we are considering (the so-called outer normal), so that the elementary product E,d°r = 
(E-n)d’r and hence dQ = E,,d°r’/r’ is positive if the vector E is pointing out of the volume (like in the 
example shown in Fig. 3a and at the upper-right area in Fig. 3b), and negative in the opposite case (for 
example, at the lower-left area in Fig. 3b). As the latter panel shows, if the charge is located outside of 
the volume, for each positive contribution dQ there is always an equal and opposite contribution to the 
integral. As a result, at the integration over the solid angle, the positive and negative contributions 
cancel exactly, so that 


fE,d*r =0. (1.15) 
S 


The real power of the Gauss law is revealed by its generalization to the case of several, 
especially many charges. Since the calculation of flux is a linear operation, the linear superposition 
principle (4) means that the flux created by several charges is equal to the (algebraic) sum of individual 
fluxes from each charge, for which either Eq. (14) or Eq. (15) are valid, depending on whether the 
charge is in or out of the volume. As the result, for the total flux we get: 


0 


fEjd?r=2 wt ae =f ptrya°r", (1.16) 
s é Eo rieV fo Y 


where Qy+ is the net charge inside volume V. This is the full version of the Gauss law.® 


In order to appreciate the problem-solving power of the law, let us revisit the problem shown in 
Fig. 2, i.e. the field of a spherical charge distribution. Due to its symmetry, which had already been 


8 The law is named after the famed Carl Gauss (1777-1855), even though it was first formulated earlier (in 1773) 
by Joseph-Louis Lagrange who was also the father-founder of analytical mechanics — see, e.g., CM Chapter 2. 
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discussed above, if we apply Eq. (16) to a sphere of a certain radius r, the electric field has to be normal 
to the sphere at each its point (i.e., E,, = E), and its magnitude has to be the same at all points: E,, = E(r). 
As a result, the flux calculation is elementary: 


LAT 
fE,d°r=4nr°E(r) ( ) 
Now applying the Gauss law (16), we get: 
Age hy ford?! = 92 fr? pernar’ (1.18) 
0 r<r € 0 
so that, finally, 
E(r) = : fr? ryder’ = +2 (1.19) 
aoe Ane, rr ; 
where Q, is the full charge inside the sphere of radius r: 
Q.= [el)arr' = Ar p(r' yr dr". (1.20) 
r'<r 0 


In particular, this formula shows that the field outside of a sphere of a finite radius R is exactly 
the same as if all its charge O = Q(R) is concentrated in the sphere’s center. (Note that this important 
result is only valid for a spherically-symmetric charge distribution.) For the field inside the sphere, 
finding the electric field still requires the explicit integration (20), but this 1D integral is much simpler 
than the 2D integral (11), and in some important cases may be readily worked out analytically. For 
example, if the charge Q is uniformly distributed inside a sphere of radius R, 


Q Q (1.21) 
r = — Sn ees . 
PW) = P= TR 
then the integration is elementary: 
E(r)= £ [red => = : gr (1.22) 
¥ 55 3€), 4e, R 


We see that in this case, the field is growing linearly from the center to the sphere’s surface, and only at 
r > R starts to decrease in agreement with Eq. (19) with constant O(r) = Q. Note also that the electric 
field is continuous for all r (including r = R) — as for all systems with finite volumic density, 


In order to underline the importance of the last condition, let us consider one more elementary 
but very important example of Gauss law’s application. Let a thin plane sheet (Fig. 4) be charged 
uniformly, with a finite areal density o= const. In this case, it is fruitful to use the Gauss volume in the 
form of a planar “pillbox” of thickness 2z (where z is the Cartesian coordinate perpendicular to the 
plane) and certain area A — see the dashed lines in Fig. 4. Due to the symmetry of the problem, it is 
evident that the electric field should be: (i) directed along the z-axis, (ii) constant on each of the upper 
and bottom sides of the pillbox, (iii) equal and opposite on these sides, and (iv) parallel to the side 
surfaces of the box. As a result, the full electric field flux through the pillbox’s surface is just 2A F(z), so 
the Gauss law (16) yields 2AE(z) = Q4/& = oA/&, and we get a very simple but important formula 


E(z) =—2- =const. (1.23) 
Ze; 
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Fig. 1.4. The electric field of 
a charged plane. 


Notice that, somewhat counter-intuitively, the field magnitude does not depend on the distance 
from the charged plane. From the point of view of the Coulomb law (5), this result may be explained as 
follows: the farther the observation point from the plane, the weaker the effect of each elementary 
charge, dO = od’r, but the more such elementary charges give contributions to the z-component of 
vector E, because they are “seen” from the observation point at relatively small angles to the z-axis. 


Note also that though the magnitude E = | E| of this electric field is constant, its component E,, 
normal to the plane (for our coordinate choice, FE.) changes its sign at the plane, experiencing a 
discontinuity (jump) equal to 


AE. = E.(z =+0)-E,(z =-0)=—. (1.24) 
€ 


This jump disappears if the surface is not charged. Returning for a split second to our charged sphere 
problem (Fig. 2), solving it we have considered the volumic charge density p to be finite everywhere, 
including the sphere’s surface, so that on it o= 0, and the electric field should be continuous — as it is. 


Admittedly, the integral form (16) of the Gauss law is immediately useful only for highly 
symmetrical geometries, such as in the two problems discussed above. However, it may be recast into an 
alternative, differential form whose field of useful applications is much wider. This form may be 
obtained from Eq. (16) using the divergence theorem of the vector algebra, which is valid for any space- 
differentiable vector, in particular E, and for the volume V limited by any closed surface S:? 


pE,d?r = [WV -Ea’r, (1.25) 


where V is the del (or “nabla”’) operator of spatial differentiation.!° Combining Eq. (25) with the Gauss 


law (16), we get 
i[v2-2 ero (1.26) 
V € 


For a given spatial distribution of electric charge (and hence of its electric field), this equation should be 
valid for any choice of the volume V. This can hold only if the function under the integral vanishes at 
each point, i.e. if!! 


9 See, e.g., MA Eq. (12.2). Note also that the scalar product under the volumic integral in Eq. (25) is nothing else 
than the divergence of the vector E — see, e.g., MA Eq. (8.4), hence the theorem’s name. 

10 See, e.g., MA Secs. 8-10. 

'l In the Gaussian units, just as in the initial Eq. (6), & has to be replaced with 1/47, so that the Maxwell 
equation (27) looks like V-E = 47p, while Eq. (28) stays the same. 


Chapter 1 Page 8 of 20 


Inhomo- 
geneous 
Maxwell 
equation 

for E 


Homo- 
geneous 
Maxwell 
equation 
for E 


Essential Graduate Physics EM: Classical Electrodynamics 


V-E=£. (1.27) 


Note that in sharp contrast with the integral form (16), Eq. (27) is local: it relates the electric field’s 
divergence to the charge density at the same point. This equation, being the differential form of the 
Gauss law, is frequently called one of the famed Maxwell equations'? — to be discussed again and again 
later in this course. 


In the mathematical terminology, Eq. (27) is inhomogeneous, because it has a right-hand side 
independent (at least explicitly) of the field E that it describes. Another, homogeneous Maxwell 
equation’s “embryo” (this one valid for the stationary case only!) may be obtained by noticing that the 
curl of the point charge’s field, and hence that of any system of charges, equals zero:!% 


(We will arrive at two other Maxwell equations, for the magnetic field, in Chapter 5, and then generalize 
all the equations to their full, time-dependent form at the end of Chapter 6. However, Eq. (27) will stay 
the same.) 


Just to get a better gut feeling of Eq. (27), let us apply it to the same example of a uniformly 
charged sphere (Fig. 2). Vector algebra tells us that the divergence of a spherically symmetric vector 
function E(r) = E(r)n, may be simply expressed in spherical coordinates:!4 V-E = [d(r°E)/dr/r’. As a 
result, Eq. (27) yields a linear ordinary differential equation for the scalar function E(7): 


/é,, forr<R, 
DB pies EP (1.29) 
r ah 0, forr = R, 
which may be readily integrated on each of these segments: 
I 1 ar = pr < 
E() =x pir dr=pr'/3+c,, forr<R, (1.30) 
ft CG forr = R. 


To determine the integration constant c;, we can use the following boundary condition: E(0) = 0. (It 
follows from the problem’s spherical symmetry: in the center of the sphere, the electric field has to 
vanish, because otherwise, where would it be directed?) This requirement gives c, = 0. The second 
constant, c2, may be found from the continuity condition E(R — 0) = E(R + 0), which has already been 
discussed above, giving c: = pR’/3 = O/4z. As a result, we arrive at our previous results (19) and (22). 


We can see that in this particular, highly symmetric case, using the differential form of the Gauss 
law is a bit more complex than its integral form. (For our second example, shown in Fig. 4, it would be 
even less natural.) However, Eq. (27) and its generalizations are more convenient for asymmetric charge 


12 Named after the genius of classical electrodynamics and statistical physics, James Clerk Maxwell (1831-1879). 
13 This follows, for example, from the direct application of MA Eq. (10.11) to any spherically-symmetric vector 
function of type f(r) = /(r)n, (in particular, to the electric field of a point charge placed at the origin), giving fo= 
= 0 and Of,/00 = Of,/Cg = 0 so that all components of the vector Vx f vanish. Since nothing prevents us from 
placing the reference frame’s origin at the point charge’s location, this result remains valid for any position of the 
charge. 

14 See, e.g., MA Eq. (10.10) for the particular case 0/O0= d/dy = 0. 
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distributions, and are invaluable in the cases where the distribution p(r) is not known a priori and has to 
be found in a self-consistent way. (We will start discussing such cases in the next chapter.) 


1.3. Scalar potential and electric field energy 


One more help for solving problems of electrostatics (and electrodynamics as a whole) may be 
obtained from the notion of the electrostatic potential, which is just the electrostatic potential energy U 
of a probe point charge g placed into the field in question, normalized by its charge: 


(1.31) 


As we know from classical mechanics,!> the notion of U (and hence ¢) makes the most sense for the 
case of potential forces — for example, those depending just on the particle’s position. Eqs. (6) and (9) 
show that stationary electric fields fall into this category. For such a field, the potential energy may be 
defined as a scalar function U(r) that allows the force to be calculated as its gradient (with the opposite 
sign): 

F=-VU. (1.32) 


Dividing both sides of this equation by the probe charge, and using Eqs. (6) and (31), we get!® 
E=-V¢. (1.33) 


To calculate the scalar potential, let us start from the simplest case of a single point charge q 
placed at the origin. For it, Eq. (7) takes the simple form 
1 r 1 n 


E= = a 1.34 
Are, oe Ané, V2 aie 


It is straightforward to verify that the last fraction in the last form of Eq. (34) is equal to —V(1/r).!” 
Hence, according to the definition (33), for this particular case 


pea (1.35) 
47é, 


(In the Gaussian units, this result is spectacularly simple: ¢= q/r.) Note that we could add an arbitrary 
constant to this potential (and indeed to any other distribution of ¢ discussed below) without changing 
the field, but it is convenient to define the potential energy so it would approach zero at infinity. 


In order to justify the introduction and the forthcoming exploration of U and @, let me 
demonstrate (I hope, unnecessarily :-) how useful the notions are, on a very simple example. Let two 
similar charges q be launched from afar, with the same initial speed vo << c each, straight toward each 
other (i.e. with the zero impact parameter) — see Fig. 5. Since, according to the Coulomb law, the 


15 See, e.g., CM Sec. 1.4. 

16 Eq. (28) could be also derived from this relation because according to vector algebra, any gradient field has no 
curl— see, e.g., MA Eq. (11.1). 

'7 This may be done either by Cartesian components or using the well-known expression Vf= (df/dr)n, valid for 
any spherically-symmetric scalar function /(7) — see, e.g., MA Eq. (10.8) for the particular case 0/00 = d/dg= 0. 
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charges repel each other with increasing force, they will stop at some minimum distance /min from each 
other, and then fly back. We could of course find /min directly from the Coulomb law. However, for that, 
we would need to write the 2"' Newton law for each particle (actually, due to the problem symmetry, 
they would be similar), then integrate them over time to find the particle velocity v as a function of 
distance, and only then recover /‘min from the requirement v = 0. 


m,q Trin = 2 ae 
Fig. 1.5. A simple problem of charged particle motion. 


The notion of potential allows this problem to be solved in one line. Indeed, in the field of 
potential forces, the system’s total energy €= T+ U=T + qé¢is conserved. In our non-relativistic case v 
<< c, the kinetic energy T is just mv’/2. Hence, equating the total energy of two particles at the points r 
=o andr =/pmin, and using Eq. (35) for @, we get 


a | ee (1.36) 


immediately giving us the final answer: rmin = g’/4aemvo'. So, the notion of scalar potential is indeed 
very useful. 


With this motivation, let us calculate ¢ for an arbitrary configuration of charges. For a single 
charge in an arbitrary position (say, at point r;’), 7 = | r | in Eq. (35) should be evidently replaced with 
lpaap |. Now, the linear superposition principle (3) allows for an easy generalization of this formula to 
the case of an arbitrary set of discrete charges, 


wr)=—-- y (1.37) 


~ 4néy tr |F—¥,| 


Finally, using the same arguments as in Sec. 1, we can use this result to argue that in the case of an 
arbitrary continuous charge distribution 


(1.38) 


Again, Dirac’s delta function allows using the last equation to recover Eq. (37) for discrete charges as 
well, so that Eq. (38) may be considered as the general expression for the electrostatic potential. 

For most practical calculations, using this expression and then applying Eq. (33) to the result, is 
preferable to using Eq. (9), because @ is a scalar, while E is a 3D vector, mathematically equivalent to 
three scalars. Still, this approach may lead to technical problems similar to those discussed in Sec. 2. For 
example, applying it to the spherically-symmetric distribution of charge (Fig. 2), we get the integral 


1 


Ey 


on | sin 0'd0' | dr PD cos. (1.39) 
0 0 


which is not much simpler than Eq. (11). 
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The situation may be much improved by recasting Eq. (38) into a differential form. For that, it is 
sufficient to plug the definition of ¢, Eq. (33), into Eq. (27): 


v-(-v¢)=2. (1.40) 
€ 
The left-hand side of this equation is nothing else than the Laplace operator of ¢ (with the minus sign), 
so that we get the famous Poisson equation'® for the electrostatic potential: 
Poisson 
Vjses., (1.41) — equation 
Eo for ¢ 
(In the Gaussian units, the Poisson equation is V'¢ = -47:.) This differential equation is so convenient 
for applications that even its particular case for p = 0, 


2 Laplace 
V p =0 A (1.42) equation 
for ¢ 


has earned a special name — the Laplace equation.'9 


In order to get a gut feeling of the Poisson equation’s value as a problem-solving tool, let us 
return to the spherically-symmetric charge distribution (Fig. 2) with a constant charge density p. 
Exploiting this symmetry, we can represent the potential as ¢(r), and hence use the following simple 
expression for its Laplace operator:2° 


ld do 
Va ge ell ate, , 1.43 
? r°? dr ( dr ( ) 
so that for the points inside the charged sphere (7 < R) the Poisson equation yields 
a G | ie [ a eee (1.44) 
r° dr dr Ey dr dr Eo 


Integrating the last form of the equation over r once, with the natural boundary condition d¢/dr L=ae0 
(because of the condition E(0) =0, which has been discussed above), we get 


Oe Ae Pf rPdr'=-P* = : Or ; (1.45) 
dr Fees ean Are, R 
Since this derivative is nothing more than —E(r), in this formula we can readily recognize our previous 


result (22). Now we may like to carry out the second integration to calculate the potential itself: 


(222 [rare =— Or be (1.46) 


4né,R° * 87é,R° 


18 Named after Siméon Denis Poisson (1781-1840), also famous for the Poisson distribution — one of the central 
results of the probability theory — see, e.g., SM Sec. 5.2. 

'9 Named after the famous mathematician (and astronomer) Pierre-Simon Laplace (1749-1827) who, together with 
Alexis Clairault, is credited for the development of the very concept of potential. 

20 See, e.g., MA Eq. (10.8) for 0/00= d/Og= 0. 
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Before making any judgment on the integration constant c,, let us solve the Poisson equation (in 
this case, just the Laplace equation) for the range outside the sphere (7 > R): 


44{(r@) me (1.47) 
r° dr dr 
Its first integral, 
dg C2 
—(r)=—, 1.48 
ar | ) r oo) 


also gives the electric field (with the minus sign). Now using Eq. (45) and requiring the field to be 
continuous at r = R, we get 


ee ee ee (1.49) 
R Are R dr Are r 
in an evident agreement with Eq. (19). Integrating this result again, 
Q ;dr_ _Q 
r)= — +¢,, forr > R, 1.50 
Hr) Are, J 4ner 30) 


we can select c3 = 0, so that ¢(00) = 0, in accordance with the usual (though not compulsory) convention. 
Now we can finally determine the constant c; in Eq. (46) by requiring that this equation and Eq. (50) 
give the same value of ¢ at the boundary r = R. (According to Eq. (33), if the potential had a jump, the 
electric field at that point would be infinite.) The final answer may be represented as 


g(r) = 2 (era forr < R. (1.51) 


This calculation shows that using the Poisson equation to find the electrostatic potential 
distribution for highly symmetric problems may be a bit more cumbersome than directly finding the 
electric field — say, from the Gauss law. However, we will repeatedly see below that if the electric 
charge distribution is not fixed in advance, using Eq. (41) may be the only practicable way to proceed. 


Returning now to the general theory of electrostatic phenomena, let us calculate the potential 
energy U of an arbitrary system of point electric charges gx. Despite the apparently simple relation (31) 
between U and 4, the result is not that straightforward. Indeed, let us assume that the charge distribution 
has a finite spatial extent, so that at large distances from it (formally, at r = «) the electric field tends to 
zero, so that the electrostatic potential tends to a constant. Selecting this constant, for convenience, to 
equal zero, we may calculate U as a sum of the energy increments AU; created by bringing the charges, 
one by one, from infinity to their final positions r; — see Fig. 6.7! According to the integral form of Eq. 
(32), such a contribution is 


AU, =-[F(r)-de=-4, f(r) -ar =4,0(r, ), (1.52) 


2! Indeed, by the very definition of the potential energy of a system, it should not depend on the way we are 
arriving at its final configuration. 
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where E(r) is the total electric field, and (1) is the total electrostatic potential during this process, 
besides the field created by the very charge gq; that is being moved. 


; Jip PER RSA ee eee eo 


ee oe from oo 
© a . 
© a ee r % : \ qi ’ r, 
'g,,¥ ‘ 
external aia 1 a ' 
charges E.., (r) \ yo¥ , Wess \ Fig. 1.6. Deriving Eqs. (55) and 
‘s © / withk'<k (60) for potential energies of a 


PS ee system of several point charges. 


the system of charges under analysis 


This expression shows that the increment AU;, and hence the total potential energy U, depend on 
the source of the electric field E. If the field is dominated by an external field Eex:, induced by some 
external charges, not being a part of the charge configuration under our analysis (whose energy we are 
calculating, see Fig. 6), then the spatial distribution @(r) is determined by this field, i.e. does not depend 
on how many charges we have already brought in, so that Eq. (52) is reduced to 


r 
AU, = iGo (t,), where $,.(r) =—[ Eq (r’)- dr". (1.53) 


Summing up these contributions, we get what is called the charge system’s energy in the external 
field:?? 


Oat = > AU, = Yd: Pex (r, ). (1.54) 
k k 


Now repeating the argumentation that has led us to Eq. (9), we see that for a continuously distributed 
charge, this sum turns into an integral: 


Energy: 
eu = [PWG (0)a°r. (1.55) extemal 


(As was discussed above, using the delta-functional representation of point charges, we may always 
return from here to Eq. (54), so that Eq. (55) may be considered as a final, universal result.) 


The result is different in the opposite limit when the electric field E(r) is created only by the very 
charges whose energy we are calculating. In this case, &(r;) in Eq. (52) is the potential created only by 
the charges with numbers k’ = 1, 2, ..., (A—1) that are already in place when the a charge is moved in 
(in Fig. 6, the charges inside the dashed boundary), and we may use the linear superposition principle to 
write 


AU, = 4 Db ry), so that U=)U; = DIG (Ue)- (1.56) 


k'<k > 
(k'<k) 


This result is so important that it is worthy of rewriting in several other forms. First, we may use Eq. 
(35) to represent Eq. (56) in a more symmetric form: 


22 An alternative, perhaps more accurate term for U,x: is the energy of the system’s interaction with the external 
field. 
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” Ane, ce |te Ty 
(k'<k) 


The expression under this sum is evidently symmetric with respect to the index swap, so that it may be 
extended into a different form, 


1 1 Vee (1 58) 
” Ane, 2 Sf lr, Ty 
(k'#k) 


where the interaction between each couple of charges is described by two equal terms under the sum, 
and the front coefficient 2 is used to compensate for this double-counting. The convenience of the last 
form is that it may be readily generalized to the continuous case: 


U = —=[d°r fa’ ea (1.59) 


Ané, 2 


(As before, in this case the restriction expressed in the discrete charge case as k # k’ is not important, 
because if the charge density is a continuous function, the integral (59) does not diverge at point r = r’.) 


To represent this result in one more form, let us notice that according to Eq. (38), the inner 
integral over r’ in Eq. (59), divided by 4 o, is just the full electrostatic potential at point r, and hence 


EI : 
sie sme [e@ea’r. (1.60) 
interaction 2 


For the discrete charge case, this result is 
1 
U=5 Dah). (1.61) 
k 


but here it is important to remember that here the “full” potential’s value «r;,) should exclude the 
(infinite) contribution from the point charge & itself. Comparing the last two formulas with Eqs. (54) and 
(55), we see that the electrostatic energy of charge interaction within the system, as expressed via the 
charge-by-potential product, is twice less than that of the energy of charge interaction with a fixed 
(“external”) field. This is the result of the fact that in the case of mutual interaction of the charges, the 
electric field E in the basic Eq. (52) is proportional to the charge’s magnitude, rather than constant.?3 


Now we are ready to address an important conceptual question: can we locate this interaction 
energy in space? This task may seem trivial: Eqs. (58)-(61) seem to imply that non-zero contributions to 
U come only from the regions where the electric charges are located. However, one of the most beautiful 
features of physics is that sometimes completely different interpretations of the same mathematical 
result are possible. To get an alternative view of our current result, let us write Eq. (60) for a volume V 
so large that the electric field on the limiting surface S is negligible, and plug into it the charge density 
expressed from the Poisson equation (41): 


23 The nature of this additional factor 4 is absolutely the same as in the well-known formula U = ('4)«x’ for the 
potential energy of an elastic spring providing the returning force F = —«x, proportional to its displacement x from 
the equilibrium position. 
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€ 24 73 
vaa ie” gd?r. (1.62) 


This expression may be integrated by parts as?4 
Us _ $o(V9),d°*r-f(Vya’r). (1.63) 
S V 


According to our condition of negligible field E =—V¢@ at the surface, the first integral vanishes, and we 
get a very important formula 


me) 2 43.,_ €0 f 7273 
U= >| Wey a'r= [Bar (1.64) 
This result, represented in the following equivalent form:25 
Energy: 
U= Jucr)d*r, with u(r) = £0 F? (r), (1.65) electric 
2 field 


certainly invites an interpretation very much different than Eq. (60): it is natural to consider u(r) as the 
spatial density of the electric field energy, which is continuously distributed over all the space where the 
field exists — rather than just its part where the charges are located. 


Let us have a look at how these two alternative pictures work for our testbed problem, a 
uniformly charged sphere. If we start with Eq. (60), we may limit the integration by the sphere volume 
(0 <r < R) where p # 0. Using Eq. (51), and the spherical symmetry of the problem (giving d°r = 
Aar’dr), we get 

O ‘a 


R R 2 2 2 
_ 1 
U = 540| ppr?dr = ~4np [| 1 |redr=2 1.66) 
24 2" Ane,R4\ 2R 5 46, 2R 


On the other hand, if we use Eq. (65), we need to integrate the energy density everywhere, i.e. both 
inside and outside of the sphere: 


R io) 
u = Sear{ ferredrs [era] (1.67) 
2 0 R 
Using Eqs. (19) and (22) for, respectively, the external and internal regions, we get 
: ° : : 1 1 
U= an } Or rdr+| a redr -(+1] gu (1.68) 
2 )\ 47, Rp 40E or 5 Aré, 2R 


This is (fortunately :-) the same answer as given by Eq. (66), but to some extent, Eq. (68) is more 
informative because it shows how exactly the electric field’s energy is distributed between the interior 
and exterior of the charged sphere.?° 


24 This transformation follows from the divergence theorem MA (12.2) applied to the vector function f = ¢V4, 
taking into account the differentiation rule MA Eq. (11.4a): V-(¢ V@) = (V¢)-(V~) + dV(VA)=(VA + 4V'h 

25 In the Gaussian units, the standard replacement & — 1/4 turns the last of Eqs. (65) into u(r) = E’/8z. 

26 Note that U — o at R > 0. Such divergence appears at the application of Eq. (65) to any point charge. Since it 
does not affect the force acting on the charge, the divergence does not create any technical difficulty for analysis 
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We see that, as we could expect, within the realm of electrostatics, Eqs. (60) and (65) are 
equivalent. However, when we examine electrodynamics (in Chapter 6 and beyond), we will see that the 
latter equation is more general and that it is more adequate to associate the electric energy with the field 
itself rather than its sources — in our current case, the electric charges. 


Finally, let us calculate the potential energy of a system of charges in the general case when both 
the internal interaction of the charges and their interaction with an external field are important. One 
might fancy that such a calculation should be very hard since, in both ultimate limits, when one of these 
interactions dominates, we have got different results. However, once again we get help from the 
almighty linear superposition principle: in the general case, for the total electric field we may write 


E(Qr)=E,(r)+ E(t), At)= bul) + bout), (1.69) 


where the index “int” now marks the field induced by the charge system under analysis, i.e. the variables 
participating (without indices) in Eqs. (56)-(65). Now let us imagine that our system is being built up in 
the following way: first, the charges are brought together at Eex: = 0, giving the potential energy Uint 
expressed by Eq. (60), and then E,x: is slowly increased. Evidently, the energy contribution from the 
latter process cannot depend on the internal interaction of the charges, and hence may be expressed in 
the form (55). As the result, the total potential energy?’ is the sum of these two components: 


U Uy HU => ola (td'r+ | PUP) bac(t dr (1.70) 


Now making the transition from the potentials to the fields, absolutely similar to that performed in Eqs. 
(62)-(65), we may rewrite this expression as 


U=Juleyd'r, with u(r) =) [5 (6) + 284 (r)- Bu) (1.71) 

One might think that this result, more general than Eq. (65) and perhaps less familiar to the 
reader, is something entirely new; however, it is not. Indeed, let us add to, and subtract Eexe(¥) from the 
sum in the brackets, and use Eq. (69) for the total electric field E(r); then Eq. (71) takes the form 

_ £0 fp ee eee 3 
U= SIE (r)d*7 ; [B2cQ)a’r. (1.72) 

Hence, in the most important case when we are using the potential energy to analyze the statics and 
dynamics of a system of charges in a fixed external field, i.e. when the second term on the right-hand 
side of Eq. (72) may be considered as a constant, we may still use for U an expression similar to the 
familiar Eq. (65), but with the field E(r) being the sum (69) of the internal and external fields. 


Let us see how this works in a very simple situation. A uniform external electric field Ex is 
applied normally to a very broad, plane layer that contains a very large and equal number of free electric 
charges of both signs — see Fig. 7. What is the equilibrium distribution of the charges over the layer? 


of charge statics or non-relativistic dynamics, but it points to a possible conceptual problem of classical 
electrodynamics as the whole at describing point charges. This issue will be discussed at the very end of the 
course (Sec. 10.6). 

27 This total U (or rather its part dependent on our system of charges) is sometimes called the Gibbs potential 
energy of the system. (I will discuss this notion in detail in Sec. 3.5.) 
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Fig. 1.7. A simple model of the electric 
field screening in a conductor. Here 
(and in all figures below) the red and 
blue colors are used to denote the 
opposite charge signs. 


Since any area-uniform distribution of the charge inside the layer does not affect the field (and 
hence its energy) outside it, and the equilibrium distribution has to minimize the total potential energy of 
the system, Eq. (72) immediately gives the answer: the distribution should provide E = Ein + Eext = 0 
inside the whole layer — the effect called the electric field screening. The only way to ensure this 
equality is to have enough free charges of opposite signs residing on the layer’s surfaces to induce a 
uniform field Eint = —Eext, exactly compensating the external field at each point inside the layer — see 
Fig. 7. According to Eq. (24), the areal density of these surface charges should equal to, with o = 
Eex/&. This is a rudimentary but reasonable model of conductors’ polarization — to be discussed in 
detail in the next chapter. 


1.4. Exercise problems 


1.1. Calculate the electric field of a thin, long, straight filament, electrically charged with a 
constant linear density A, using two approaches: 


(1) directly from the Coulomb law, and 
(11) using the Gauss law. 


1.2. Two thin, straight parallel filaments, separated by distance p, carry 
equal and opposite uniformly distributed charges with linear density 2 — see the 
figure on the right. Calculate the force (per unit length) of the Coulomb interaction 
of the wires. Compare its dependence on p with the Coulomb law for the force <p? 
between two point charges, and interpret their difference. 


—A +A 


1.3. Calculate the electric field of the following spherically-symmetric charge distribution: p(r) = 
poexp{-Ar}. 


1.4. A sphere of radius R, whose volume had been charged with a constant density p, is split with 
a very narrow planar gap passing through its center. Calculate the force of the mutual electrostatic 
repulsion of the resulting two hemispheres. 


1.5. A thin spherical shell of radius R, which had been charged with a constant areal density o, is 
split into two equal halves with a very narrow, planar cut passing through the sphere’s center. Calculate 
the force of electrostatic repulsion between the resulting hemispheric shells, and compare the result with 
that of the previous problem. 


1.6. Calculate the distribution of the electrostatic potential created by a straight, thin filament of 


a finite length 2/, charged with a constant linear density 2, and explore the result in the limits of very 
small and very large distances from the filament. 
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1.7. A thin plane sheet, perhaps of an irregular shape, carries an electric charge with a constant 
areal density o. 


(1) Express the electric field’s component normal to the plane, at a certain distance from it, via 
the solid angle Q at which the sheet is visible from the observation point. 


(ii) Use the result to calculate the field in the center of a cube, with one face charged with a 
constant density o. 


1.8. Can one create, in an extended region of space, electrostatic fields with the Cartesian 
components proportional to the following products of Cartesian coordinates {x, y, z}: 


(i) { YZ, XZ, xy}, 
(ii) {xy, xy, yz}? 


1.9. Distant sources have been used to create different uniform electric fields in two semi-spaces: 


Es at z <0, Zz 
E(r) r>>R = n. x 
E.,  atz>0, | 


everywhere except for a transitional region of scale R near the origin, where 
the field is perturbed but still axially-symmetric. (As will be discussed in the 
next chapter, this may be done, for example, using a thin conducting OR 
membrane with a round hole of radius R in it — see the figure on the right.) = | | t | 
Prove that such field may serve as an electrostatic lens for charged particles 

flying along the z-axis, at distances p << R from it, and calculate the focal distance fof the lens. Spell 
out the conditions of validity of your result. 


1.10. Eight equal point charges q are located at the corners of a cube of 
side a. Calculate all Cartesian components £; of the electric field, and their spatial 
derivatives OE,/Or;, at the cube’s center, where r; are the Cartesian coordinates 
oriented along the cube’s sides — see the figure on the right. Are all your results 
valid for the center of a plane square, with four equal charges at its corners? 


1.11. By a direct calculation, find the average electric potential of a spherical surface of radius R, 
created by a point charge q located at a distance r > R from the sphere’s center. Use the result to prove 
the following general mean value theorem: the electric potential at any point is always equal to its 


average value on any spherical surface with the center at that point, and containing no electric charges 
inside it. 


1.12. Two similar thin, circular, coaxial disks of radius R, separated Baa» 


by distance 2d, are uniformly charged with equal and opposite areal 
densities to — see the figure on the right. Calculate and sketch the 24 


distribution of the electrostatic potential and the electric field of the disks Giles 
along their common axis. 
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1.13. The electrostatic potential created by some electric charge distribution, is 


g(r) = ts + Je0\- a 


where C and ro are constants, and r=|r| is the distance from the origin. Calculate the charge 
distribution in space. 


1.14. A thin flat sheet cut in the form of a rectangle of size axb, is electrically charged with a 
constant areal density o. Without an explicit calculation of the spatial distribution @r) of the 
electrostatic potential induced by this charge, find the ratio of its values at the center and at the corners 
of the rectangle. 


Hint: Consider partitioning the rectangle into several similar parts and using the linear 
superposition principle. 


1.15. Calculate the electrostatic energy per unit area of the system of two thin, parallel planes 
with equal and opposite charges of a constant areal density o, separated by distance d. 


1.16. The system analyzed in the previous problem (two thin, Eon 
parallel, oppositely charged planes) is now placed into an external, +o 
uniform, normal electric field Eext= o/€ — see the figure on the right. Find |¢ 
the force (per unit area) acting on each plane, by two methods: -—o 


(i) directly from the electric field distribution, and 
(11) from the potential energy of the system. 


1.17. Explore the relationship between the Laplace equation (42) and the condition of the 
minimum of the electrostatic field energy (65). 


1.18. Prove the following reciprocity theorem of electrostatics:*8 if two spatially-confined charge 
distributions p;(r) and 2(r) create respective distributions ¢,(r) and ¢(r) of the electrostatic potential, 


then 
JaW)e()a’r=[p,(r)d(r)a*r. 


Hint: Consider the integral } E,-E,d’r. 


1.19. Calculate the energy of electrostatic interaction of two spheres, of radii R; and R2, each 
with a spherically-symmetric charge distribution, separated by distance d > R; + Ro. 


1.20. Calculate the electrostatic energy U of a (generally, thick) spherical shell, 
with charge Q uniformly distributed through its volume — see the figure on the right. 
Analyze and interpret the dependence of U on the inner cavity’s radius R;, at fixed O 
and Ro. 


28 This is only the simplest one of several reciprocity theorems in electromagnetism — see, e.g., Sec. 6.8 below. 


Chapter 1 Page 20 of 20 


Essential Graduate Physics EM: Classical Electrodynamics 


Chapter 2. Charges and Conductors 


This chapter starts our discussion of the very common situations when the electric charge distribution in 
space is not known a priori, but rather should be calculated in a self-consistent way together with the 
electric field it creates. The simplest situations of this kind involve conductors and lead to the so-called 
boundary problems in that the partial differential equations describing the field distribution have to be 
solved with appropriate boundary conditions. Such problems are also typical for other parts of 
electrodynamics (and indeed for other fields of physics as well), so that following tradition, I will use 
this chapter’s material as a playground for a discussion of various methods of boundary problem 
solution, and the special functions most frequently encountered on that way. 


2.1. Polarization and screening 


The basic principles of electrostatics outlined in Chapter 1 present the conceptually full solution 
of the problem of finding the electrostatic field (and hence Coulomb forces) induced by electric charges 
distributed over space with some density p(r). However, in most practical situations, this function is not 
known but should be found self-consistently with the field. For example, if a sample of relatively dense 
material is placed into an external electric field, it is typically polarized, i.e. acquires some local charges 
of its own, which contribute to the total electric field E(r) inside, and even outside it — see Fig. la. 


(b) Fig. 2.1. Two typical electrostatic 


situations involving conductors: 
(a) polarization by an external 
field, and (b) re-distribution of 
conductor’s own charge over its 
surface — schematically. Here and 
below, the red and blue points 
denote charges of opposite signs. 


The full solution of such problems should satisfy not only the fundamental Eq. (1.7) but also the 
so-called constitutive relations between the macroscopic variables describing the sample’s material. 
Due to the atomic character of real materials, such relations may be very involved. In this part of my 
series, I will have time to address these relations, for various materials, only rather superficially,! 
focusing on their simple approximations. Fortunately, in most practical cases such approximations work 
very well. 


In particular, for the polarization of good conductors, a very reasonable approximation is given 
by the so-called macroscopic model, in which the free charges in the conductor are is treated as a 
charged continuum that is free to move under the effect of the force F = gE exerted by the macroscopic 
electric field E, i.e. the field averaged over space on the atomic scale — see also the discussion at the end 


! A more detailed discussion of the electrostatic field screening may be found, e.g., in SM Sec. 6.4. (Alternatively, 
see either Sec. 13.5 of J. Hook and H. Hall, Solid State Physics, 2" ed., Wiley, 1991; or Chapter 17 of N. 
Ashcroft and N. Mermin, Solid State Physics, Brooks Cole, 1976.) 
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of Sec. 1.1. In electrostatics (which excludes the case dc currents, to be discussed in Chapter 4 below), 
there should be no such motion, so that everywhere inside the conductor the macroscopic electric field 


should vanish: 


This is the electric field screening? effect, meaning, in particular, that conductors’ polarization in an Qo, uctor: 
external electric field has the extreme form shown (rather schematically) in Fig. la, with the field of the macroscopic 
induced surface charges completely compensating the external field in the conductor’s bulk. Note that "°°! 

Eq. (1a) may be rewritten in another, frequently more convenient form: 


where ¢ is the macroscopic electrostatic potential related to the macroscopic field by Eq. (1.33).3 Ufa 
problem includes several unconnected conductors, the constant in Eq. (1b) may be specific for each of 
them. ) 


Now let us examine what we can say about the electric field in free space just outside a 
conductor, within the same macroscopic model. At close proximity, any smooth surface (in our current 
case, that of a conductor) looks planar. Let us integrate Eq. (1.28) over a narrow (d << /) rectangular 
loop C encircling a part of such plane conductor’s surface (see the dashed line in Fig. 2a), and apply it to 
the electric field vector E the well-known vector algebra equality — the Stokes theorem* 


f(VxE),d7r = fE-dr, (2.2) 
where S is any surface limited by the Saat C. ; 
(a) E (b) 
free space . 7 =< —— ie 7 aa 
coOOOOOO OS | cOOO0OOOOO 
conductor ppg l O=¢. 


Fig. 2.2. (a) The surface charge layer at a conductor’s surface, and 
(b) the electric field lines and equipotential surfaces near it. 


In our current case, the contour is dominated by two straight lines of length /, so that if / is much 
smaller than the characteristic spatial scale of field’s changes but much larger than the interatomic 
distances, the right-hand side of Eq. (2) may be well approximated as [(E in — (Eout] /, where EF, is the 
tangential component of the corresponding macroscopic field, parallel to the surface. On the other hand, 
according to Eq. (1.28), the left-hand side of Eq. (2) equals zero. Hence, the macroscopic field’s 


2 This term, used for the electric field, should not be confused with shielding — the term used for the description of 
magnetic field’s reduction by magnetic materials — see Chapter 5 below. 

3 Since averaging of a function over space is a linear operation, any linear relation between genuine (microscopic) 
variables, including Eq. (1.33), is also valid for the corresponding macroscopic variables. 

4 See, e.g., MA Eq. (12.1). 
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component E, should be continuous at the surface, and to satisfy Eq. (1a) inside the conductor, the 
component has to vanish immediately outside it: (E)our = 0. This means that the electrostatic potential 
immediately outside of a conducting surface cannot change along it. In other words, the equipotential 
surfaces outside a conductor should “lean” to the conductor’s surface, with their potential values 
approaching the constant potential of the conductor — see Fig. 2b. 


So, the electrostatic field just outside any conductor has to be normal to its surface. To find this 
normal field, we may apply the universal relation (1.24) to our macroscopic field E. Since in our current 
case E,, = 0 inside the conductor, we get 


(2.3) 


where o is the macroscopic areal density of the conductor’s surface charge. Note that deriving this 
universal relation between the normal component of the field and the surface charge density, we have 
not used any cause-vs-effect arguments, so that Eq. (3) is valid regardless of whether the surface charge 
is induced by an externally applied field (as in the case of conductor’s polarization, shown in Fig. 1a), or 
the electric field is induced by the electric charge placed on the conductor and then self-redistributed 
over its surface (Fig. 1b), or it is some combination of both effects. 


Before starting to use the macroscopic model for the solution of particular problems of 
electrostatics, let me use the balance of this section to briefly discuss its limitations. (The reader in a 
rush may skip this discussion and proceed to Sec. 2; however, I believe that every educated physicist has 
to understand when this model works, and when it does not.) 


Since the argumentation which has led us to Eq. (1.24) and hence to Eq. (3) is valid for any 
thickness d of the Gauss pillbox, within the macroscopic model, the whole surface charge is located 
within an infinitely thin surface layer. This is of course impossible physically: for one, this would 
require an infinite volumic density p of the charge. In reality, the charged layer (and hence the region of 
the electric field’s crossover from the finite value (3) to zero) has a nonzero thickness /. At least three 
effects contribute to J. 


(i) Atomic structure of matter. Within each atom, and frequently between the adjacent atoms as 
well, the genuine (“microscopic”) electric field is highly nonuniform. Thus, as was already stated above, 
Eq. (1) is valid only for the macroscopic field, i.e. the field averaged over distances of the order of the 
atomic size scale aj ~ 107'° m,5 and cannot be applied to the field changes on that scale. As a result, the 
surface layer of charges cannot be much thinner than ap. 


(11) Thermal excitation. According to Eq. (1.9), in the whole field-free bulk of a conductor, the 
net charge density, 9 = e(n — n.), © has to vanish, so that the numbers of protons in atomic nuclei (7) and 
electrons (n.) per unit volume have to be balanced. However, if an external electric field penetrates a 
conductor, free electrons can shift in or out of its affected part, depending on the field’s contribution to 
their potential energy, AU = g.¢ = —e@. (Here the arbitrary constant in ¢ is chosen to give ¢= 0 well 
inside the conductor.) In classical statistics, this change is described by the Boltzmann distribution:7 


5 This scale originates from the quantum-mechanical effects of electron motion, characterized by the Bohr radius 
rp = I’/me/42&) ~ 0.53x10°'° m — see, e.g., QM Eg. (1.10). It also defines the scale Ey = e/4ze rp’ ~ 10’ SI 
units (V/m) of the microscopic electric fields inside atoms. (Please note how large these fields are.) 

6 In this series, e denotes the fundamental charge, e ~ 1.6x10"'’ C > 0, so that the electron’s charge equals (-e). 

7 See, e.g., SM Sec. 3.1. 
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n,()= nex o (2.4) 


where T is the absolute temperature in kelvins (K), and kg = 1.38x107° J/K is the Boltzmann constant. 


As aresult, the net charge density is 
p(r)=en c = oof 80) (2.5) 
B 


The penetrating electric field polarizes the atoms as well. As will be discussed in the next chapter, such 
polarization results in the reduction of the electric field by a material-specific dimensionless factor « 
(larger, but typically not too much larger than 1), called the dielectric constant. As a result, the Poisson 
equation (1.41) takes the so-called Poisson-Boltzmann form,® 


2 
Oo) 0g) (2.6) 
dz KE, KE, kT 


where we have taken advantage of the 1D geometry of the system to simplify the Laplace operator, with 
the z-axis normal to the surface. 


Even with this simplification, Eq. (6) is a nonlinear differential equation allowing an analytical 
but rather bulky solution. Since our current goal is just to estimate the field penetration depth A, let us 
simplify the equation further by considering the low-field limit: e |¢| ~ e| E |A << kpT. In this limit, we 
may extend the exponent into the Taylor series, and keep only two leading terms (of which the first one 
cancels with the following unity). As a result, Eq. (6) becomes linear, 


dg en ep F dp 1 


= 5 .e. = TZ, 2.7 
dz’ £é, kT dz 2X i 
where the constant A, in this case, is called the Debye (or “Debye-Hiickel’’) screening length Ap: 
T 
2 = Kéoky (2.8) 
en 


As the reader certainly knows, Eq. (7) describes an exponential decrease of the electric potential, 
with the characteristic length 2p: ¢@ «x exp{-z/Ap}, where the z-axis is directed into the conductor. 
Plugging in the involved fundamental constants into Eq. (8), we get the following estimate: Ap[m] ~ 
70x(« x7[K\V/n[m*])'*. According to this formula, in semiconductors at room temperature, the Debye 
length may be rather substantial. For example, in silicon (« ~ 12) doped to the free charge carrier 
concentration n = 3x10'* cm” (the value typical for modern integrated circuits),? Ap ~ 2 nm, still well 


8 This equation and/or its straightforward generalization to the case of charged particles (ions) of several kinds is 
also (especially in the theories of electrolytes and plasmas) called the Debye-Hiickel equation. 

° There is a good reason for making an estimate of Ap for this case: the electric field created by the gate electrode 
of a field-effect transistor, penetrating into doped silicon by a depth ~%p, controls the electric current in this most 
important electronic device — on whose back all our information technology rides. Because of that, Ap establishes 
the possible scale of semiconductor circuit shrinking, which is the basis of the well-known Moore’s law. 
(Practically, the scale is determined by integrated circuit patterning techniques, and Eq. (8) may be used to find 
the proper charge carrier density n and hence the necessary level of silicon doping — see, e.g., SM Sec. 6.4.) 
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above the atomic size scale ao, thus justifying the estimate However, for typical good metals (n ~ 10”° 
m”, «~ 10) the same formula gives Ap ~ 10°! m, less than apo. In this case, Eq. (8) should not be taken 
literally, because it is based on the assumption of a continuous charge distribution. 


(iii) Quantum statistics. Actually, the last estimate is not valid for good metals (and highly doped 
semiconductors) for one more reason: their free electrons obey the quantum (Fermi-Dirac) statistics 
rather than the Boltzmann distribution (4).!° As a result, at all realistic temperatures the electrons form a 
degenerate quantum gas, occupying all available energy states below some energy level & >> kp7, 
called the Fermi energy. In these conditions, the screening of relatively low electric field may be 
described by replacing Eq. (5) with 


p =e(n—n,)=eg(&)(-U) =-e’ 9(4,)¢, (2.9) 


where g(¢) is the density of quantum states (per unit volume per unit energy) at the electron’s energy ¢. 
At the Fermi surface, the density is of the order of n/ép.!! As a result, we again get the second of Eqs. 
(7), but with a different characteristic scale 2, defined by the following relation: 


(2.10) 


and called the Thomas-Fermi screening length. Since for most good metals, n is of the order of 10°? m?, 
and ¢éf is of the order of 10 eV, Eq. (10) typically gives Arr close to a few ao, and makes the Thomas- 
Fermi screening theory valid at least semi-quantitatively. 


To summarize, the electric field penetration into good conductors is limited to a depth /4 ranging 
from a fraction of a nanometer to a few nanometers, so that for problems with a characteristic linear size 
much larger than that scale, the macroscopic model (1) gives a very good accuracy, and we will use 
them in the rest of this chapter. However, the reader should remember that in many situations involving 
semiconductors, as well as at some nanoscale experiments with metals, the electric field penetration 
should be taken into account. 


Another important condition of the macroscopic model’s validity is imposed on the electric 
field’s magnitude, which is especially significant for semiconductors. Indeed, as Eq. (6) shows, Eq. (7) 
is only valid if e| ¢| << kg7, so that | E | ~| @| /Ap should be much lower than kg7/eAp. In the example 
given above (Ap = 2 nm, T = 300 K), this means | E | << E, ~10’V/m = 10°V/cm — the value readily 
reachable in the lab. In larger fields, the field penetration becomes nonlinear, leading in particular to the 
very important effect of carrier depletion; it will be discussed in SM Sec. 6.4. For typical metals, such 
linearity limit, E,~ &:/eArp is much higher, ~10'' V/m, but the model may be violated at lower fields by 
other effects, such as the impact-ionization leading to electric breakdown, which may start at ~10° V/m. 


2.2. Capacitance 


Let us start using the macroscopic model from systems consisting of charged conductors only, 
with no so-called stand-alone charges in the free space outside them.!* Our goal here is to calculate the 


10 See, e.g., SM Sec. 2.8. For a more detailed derivation of Eq. (10), see SM Chapter 3. 
!1 See, e.g., SM Sec. 3.3. 
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distributions of the electric field E and potential ¢ in space, and the distribution of the surface charge 
density o over the conductor surfaces. However, before doing that for particular situations, let us see if 
there are any integral measures of these distributions, which should be our primary focus. 


The simplest case is of course a single conductor in the otherwise free space. According to Eq. 
(1b), all its volume should have the same electrostatic potential ¢, evidently providing one convenient 
global measure of the situation. Another integral measure is provided by the total charge 


Q=(|pd'*r=}od’r, (2.11) 


where the last integral is extended over the whole surface S of the conductor. In the general case, what 
can we tell about the relation between QO and ¢? At O= 0, there is no electric field in the system, and it is 
natural (though not absolutely necessary) to select the arbitrary constant in the electrostatic potential to 
have ¢= 0 everywhere. Then, if the conductor is charged with a non-zero Q, according to the linear Eq. 
(1.7), the electric field at any point of space has to be proportional to that charge. Hence the electrostatic 
potential at all points, including its value ¢ inside the conductor, is also proportional to Q: 


p= pQ. (2.12) 


The proportionality coefficient », which depends on the conductor’s size and shape, but on neither ¢ nor 
Q, is called its reciprocal capacitance (or, not too often, “electric elastance”). Usually, Eq. (12) is 
rewritten in a different form, 


Q=C¢, with cet, (2.13) 


FP 


where C is called self-capacitance. (Frequently, C is called just capacitance, but as we will see very 
soon, for more complex situations the latter term may be ambiguous.) 


Before calculating C for particular geometries, let us have a look at the electrostatic energy U of 
a single conductor. To calculate it, of the several relations discussed in Chapter 1, Eq. (1.61) is most 
convenient, because all elementary charges q; are now parts of the conductor charge, and hence reside at 
the same potential ¢— see Eq. (1b) again. As a result, the equality becomes very simple: 


1 1 
U=-$)>\9q, =~90. (2.14) 
2 5 2 
Moreover, using the linear relation (13), the same result may be re-written in two more forms: 


(2.15) 


We will discuss several ways to calculate C in the next sections, and right now will have a quick 
look at just the simplest example for that we have calculated everything necessary in the previous 
chapter: a conducting sphere of radius R. Indeed, we already know the electric field distribution: 
according to Eq. (1), E = 0 inside the sphere, while Eq. (1.19), with O(r) = Q, describes the field 
distribution outside it, because of the evident spherical symmetry of the surface charge distribution. 


12 Tn some texts, these charges are called “free”. This term is somewhat misleading, because they may well be 
bound, i.e. unable to move freely. 
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Moreover, since the latter formula is exactly the same as for the point charge placed in the sphere’s 
center, the potential’s distribution in space may be obtained from Eq. (1.35) by replacing q with the 
sphere’s full charge Q. Hence, on the surface of the sphere (and, according to Eq. (1b), through its 
interior), 


LQ 
= =. 2.16 
, 47é, R en 
Comparing this result with the definition (13), for the sphere’s self-capacitance we obtain a very simple 


formula!3 
C=428,R. (2.17) 


This formula, which should be well familiar to the reader, is convenient to get some feeling of 
how large the SI unit of capacitance (1 farad, abbreviated as F) is: the self-capacitance of Earth (Rg ~ 
6.34x10° m) is below 1 mF! Another important note is that while Eq. (17) is not exactly valid for a 
conductor of arbitrary shape, it implies an important general estimate 


C ~278,a (2.18) 
where a is the scale of the linear size of any conductor.!4 


Now proceeding to a system of two arbitrary conductors, we immediately see why we should be 
careful with the capacitance definition: one constant C is insufficient to describe all electrostatic 
properties of such a system. Indeed, here we have two, generally different conductor potentials, ¢; and 
g>, that may depend on both conductor charges, Q; and Q2. Using the same arguments as for the single- 
conductor case, we may conclude that the dependence is always linear: 


~, = pi + #10), 


2.19 
~, = pid, +p), 


but now has to be described by more than one coefficient. Actually, it turns out that there are three 
rather than four different coefficients in these relations, because 


Pig = Pu (2.20) 


This equality may be proved in several ways, for example, using the general reciprocity theorem of 
electrostatics (whose proof was the subject of Problem 1.17): 


Je )¢(r)a’r = | p,(r)o(r)a*r, (2.21) 


'3 Tn the Gaussian units, using the standard replacement 47 — 1, this relation takes an even simpler form: C = 
R, very easy to remember. Generally, in the Gaussian units (but not in the SI system!) the capacitance has the 
dimensionality of length, i.e. is measured in centimeters. Note also that a fractional SI unit, 1 picofarad (10° F), 
is very close to the Gaussian unit: 1 pF = [(1x107’)/(4ze@x107)] cm ~ 0.8998 cm. So, 1 pF is close to the 
capacitance of a metallic ball with a 1-cm radius, making this unit very convenient for human-scale systems. 

14 These arguments are somewhat insufficient to say which size should be used for a in the case of narrow, 
extended conductors, e.g., a thin, long wire. Very soon we will see that in such cases the electrostatic energy, and 
hence C, depends mostly on the /arger size of the conductor. 
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where ¢(r) and g(r) are the potential distributions induced, respectively, by two electric charge 
distributions, oi(1r) and 2(r). In our current case, each of these integrals is limited to the volume (or, 
more exactly, the surface) of the corresponding conductor, where each potential is constant and may be 
taken out of the integral. As a result, Eq. (21) is reduced to 


0,6 (r,)=0,6(r,). (2.22) 


In terms of Eq. (19), d(r1) is just ~12Q2, while by) equals ~2:Q). Plugging these expressions into Eq. 
(22), and canceling the product Q:Q2, we arrive at Eq. (20). 

Hence the 2x2 matrix of coefficients »; (called the reciprocal capacitance matrix) is always 
symmetric, and using the natural notation p11 = p1, p22 = »2, p12 = p21 =~”, We May rewrite it in a simpler 
form: 

i“ Pp | (2.23) 
P Po 


Plugging the relation (19), in this new notation, into Eq. (1.61), we see that the full electrostatic energy 
of the system may be expressed as a quadratic form of its charges: 


U =F + pO,0, +01. (2.24) 


It is evident that the middle term on the right-hand side of this equality describes the electrostatic 
coupling of the conductors. (Without it, the energy would be just a sum of two independent electrostatic 
energies of conductors | and 2.)!5 Still, even with this simplification, Eqs. (19) and (20) show that in the 
general case of arbitrary charges Q; and Q2, the system of two conductors should be characterized by 
three, rather than just one coefficient (“the capacitance”). This is why we may attribute a single 
capacitance to the system only in some particular cases. 


For practice, the most important of them is when the system as the whole is electrically neutral: 
QO; =—Q>2 = Q. In this case, the most important function of Q is the difference between the conductors’ 


potentials, called the voltage:!© 


For that function, the subtraction of two Eqs. (19) gives 


(2.26) 


where the coefficient C is called the mutual capacitance between the conductors — or, again, just 
“capacitance” if the term’s meaning is absolutely clear from the context. The same coefficient describes 


'S This is why systems with p << p1, ~ are called weakly coupled, and may be analyzed using approximate 
methods — see, e.g., Fig. 4 and its discussion below. 

16 A word of caution: in condensed matter physics and electrical engineering, voltage is most commonly defined 
as the difference between electrochemical rather than electrostatic potentials. These two notions coincide if the 
conductors have equal workfunctions — for example if they are made of the same material. In this course, this 
condition will be implied, and the difference between the two voltages ignored — to be discussed in detail in SM 
Sec. 6.3. 
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the electrostatic energy of the system. Indeed, plugging Eqs. (19) and (20) into Eq. (24), we see that 
both forms of Eq. (15) are reproduced if ¢is replaced with V, Q; with Q, and with C meaning the mutual 
capacitance: 


(2.27) 


The best-known system for that the mutual capacitance C may be readily calculated is the plane 
(or “parallel-plate”) capacitor: a system of two conductors separated with a narrow plane gap of a 
constant thickness d and an area A ~ a’ >> d’ — see Fig. 3. 


| 
d <<a 
O- 0 - 0-0-0 OO 
We’ i Fig. 2.3. Plane capacitor 
— schematically. 


a 
a 

Since the surface charges that contribute to the opposite charges +O of the conductors of this 
system, attract each other, in the limit d << a they sit entirely on the opposite surfaces limiting the gap, 
so there is virtually no electric field outside of the gap, while (according to the discussion in Sec. 1) 
inside the gap it is normal to the surfaces. According to Eq. (3), the magnitude of this field is E = o/&. 
Integrating this field across thickness d of the narrow gap, we get V = ¢, — @& = Ed = od/&, so that o= 
&)V/d. But due to the constancy of the potential of each electrode, V should not depend on the position in 
the gap area. As a result, o should be also constant over all the gap area A, regardless of the external 
geometry of the conductors (see Fig. 3 again), and hence QO = oA = &V/d. Thus we may write V= O/C, 
with 


E,A 
6 i (2.28) 

Let me offer a few comments on this well-known formula. First, it is valid even if the gap is not 
quite planar — for example, if it gently curves on a scale much larger than d, but retains its thickness. 
Second, Eq. (28), which is valid only if A ~ a’ is much larger than d’, ignores the nonuniform electric 
fields spreading to distances ~d beyond the gap edges. Such fringe fields result in an additional stray 
capacitance C’ ~ @a << C ~ &ax(a/d).!7 Finally, the same condition (4 >> d’) assures that C is much 
larger than the self-capacitance C; of each conductor — see Eq. (18). 


The opportunities opened by the last fact for electronic engineering and experimental physics 
practice are rather astonishing. For example, a very realistic 3-nm layer of high-quality aluminum oxide, 
which may provide a nearly perfect electric insulation between two thin conducting films, with an area 
of 0.1 m’ (a typical area of silicon wafers used in the semiconductor industry) provides C ~ 1 mF,!8 
larger than the self-capacitance of the whole planet Earth! 


'7 The exact value of C’ depends on the shape of the conductors. In a rare case when it has been calculated 
analytically, two thin round concentric disks of radius R, C’ = &R [In(162R/d) — 1]. 

18 Just as in Sec. 1, for the estimate to be realistic, I took into account the additional factor «(for aluminum oxide, 
close to 10) which should be included in the numerator of Eq. (28) to make it applicable to dielectrics — see 
Chapter 3 below. 
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In a plane capacitor with d << a, the electrostatic coupling of the two conductors is evidently 
very strong. As an opposite example of a weakly coupled system, let us consider two conducting spheres 
of the same radius R, separated by a much larger distance d (Fig. 4). 


R R 
d>>R Fig. 2.4. A system of two far-separated, 
similar conducting spheres. 


In this case, the diagonal components of the matrix (23) may be approximately found from Eq. 


(16), i.e. by neglecting the coupling altogether: 
1 


OEE Ae Re 


(2.29) 


Now, if we had just one sphere (say, number 1), the electric potential at distance d from its center would 
be given by Eq. (16): ¢ = O,/4zed. If we move to this point a small (R << d) sphere without its own 
charge, we may expect that its potential should not be too far from this result, so that ¢& ~ Q)/4 70d. 
Comparing this expression with the second of Eqs. (19) (taken for QO, = 0), we get 


x BE pin. 2.30 
e Are, d a ( 
From here and Eq. (26), the mutual capacitance 
& : = 27é,R. (2.31) 


We see that (somewhat counter-intuitively), in this limit C does not depend substantially on the distance 
between the spheres, i.e. does not describe their electrostatic coupling. The off-diagonal coefficients of 
the reciprocal capacitance matrix (20) play this role much better — see Eq. (30). 


Now let us consider the case when only one conductor of the two is charged, for example Q; = 
Q, while Q2 = 0. Then Eqs. (19)-(20) yield 


~, a POQ)- (2.32) 


Now, we may follow Eq. (13) and define C; = I/p; (and C2 = 1/p2), just to see that such partial 
capacitances of the conductors of the system differ from its mutual capacitance C — cf. Eq. (26). For 
example, in the case shown in Fig. 4, Ci = Cp > 477@R = 2C. 


Finally, let us consider one more frequent case when one of the conductors carries a certain 
charge (say, QO; = Q), but the potential of its counterpart is sustained constant, say @ = 0.!9 (This 
condition is especially easy to implement if the second conductor is much larger than the first one. 
Indeed, as the estimate (18) shows, in this case it would take a much larger charge Q2 to make the 
potential ¢ comparable with ¢).) In this case the second of Eqs. (19), with the account of Eq. (20), 
yields QO. = — (v/~)Q}. Plugging this relation into the first of those equations, we get 


!9 Tn electrical engineering, such a constant-potential conductor is called the ground. This term stems from the fact 
that in many cases the electrostatic potential of the (weakly) conducting ground at the Earth’s surface is virtually 
unaffected by laboratory-scale electric charges. 
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>\-1 
O, =C"'¢,, with Cy at f 7 f: 2° (2.33) 
P2 PiP2- PF 


Thus, this effective capacitance of the first conductor is generally different both from both its partial 
capacitance C and the mutual capacitance C of the system, emphasizing again how accurate one should 
be using the term “capacitance” without a qualifier. 


Note also that none of these capacitances is equal to any element of the matrix reciprocal to the 


matrix (23): 
-1 
Pi PF 1 ae a a 
-=1_| } (2.34) 
P P2 P PiP2\ P ~ Pi 


Because of this reason, this physical capacitance matrix, which expresses the vector of conductor 
charges via the vector of their potentials, is less convenient for most applications than the reciprocal 
capacitance matrix (23). The same conclusion is valid for multi-conductor systems, which are most 
conveniently characterized by an evident generalization of Eq. (19). Indeed, in this case, even the 
mutual capacitance between two selected conductors may depend on the electrostatic conditions of other 
components of the system. 


Logically, at this point I would need to discuss the particular, but practically very important case 
when the regions where the electric field between each pair of conductors is most significant do not 
overlap — such as in the example shown in Fig. 5a. In this case, the system’s properties may be discussed 
using the equivalent-circuit language, representing each such region as a /umped (localized) capacitor, 
with a certain mutual capacitance C, and the whole system as some connection of these capacitors by 
conducting “wires”, whose length and geometry are not important — see Fig. 5b. 


Fig. 2.5. (a) A simple system of 
conductors, with three well- 
localized regions of high electric 
field (and hence surface charge) 
concentration, and  (b) _ its 
representation with an equivalent 
circuit of three lumped capacitors. 


Since the analysis of such equivalent circuits is covered in typical introductory physics courses, I 
will save time by skipping their discussion. However, since such circuits are very frequently met in the 
physical experiment and electrical engineering practice, I would urge the reader to self-test their 
understanding of this topic by solving a couple of problems offered at the end of this chapter,?° and if 
their solution presents any difficulty, review the corresponding section in an undergraduate textbook. 


20 These problems have been selected to emphasize the fact that not every circuit may be reduced to the simplest 
connections of the capacitors in parallel and/or in series. 
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2.3. The simplest boundary problems 


In the general case when the electric field distribution in the free space between the conductors 
cannot be easily found from the Gauss law or a particular symmetry, the best approach is to try to solve 
the differential Laplace equation (1.42), with the boundary conditions (1b): 


V*g=0, ¢ 


5, = 9% (2.35) 


where S; is the surface of the k"" conductor of the system. After this boundary problem has been solved, 
i.e. the spatial distribution @(r) has been found at all points outside the conductors, it is straightforward 
to use Eq. (3) to find the surface charge density, and finally the total charge 


QO, = pod’r (2.36) 
Si 


of each conductor, and hence any component of the reciprocal capacitance matrix. As an illustration, let 
us implement this program for three very simple problems. 


(i) Plane capacitor (Fig. 3). In this case, the easiest way to solve the Laplace equation is to use 
the linear (Cartesian) coordinates with one axis (say, z) normal to the conductor surfaces — see Fig. 6. 


Z 


Fig. 2.6. The plane capacitor as the system for the 
simplest illustration of the boundary problem 
(35) and its solution. 


Xx 


In these coordinates, the Laplace operator is just the sum of three second derivatives.?! It is 
evident that due to the problem’s translational symmetry within the [x, y] plane, deep inside the gap (i.e. 
at any lateral distance from the edges much larger than d) the electrostatic potential may only depend on 
the coordinate normal to the gap surfaces: @&(r) = «(z). For such a function, the derivatives over x and y 
vanish, and the boundary problem (35) is reduced to a very simple ordinary differential equation 


2 
— (z) =0, (2.37) 
with boundary conditions 
g(0)=0, gd)=V. (2.38) 


(For the sake of notation simplicity, I have used the discretion of adding a constant to the potential, to 
make one of the potentials vanish, and also the definition (25) of the voltage V.) The general solution of 
Eq. (37) is a linear function: ¢ (z) = c)z + c2, whose constant coefficients c; 7 may be readily found from 
the boundary conditions (38). The final solution is 


Z 


g=V— (2.39) 


21 See, e.g. MA Eq. (9.1). 
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From here the only nonzero component of the electric field is 


E, ae ae (2.40) 
dz d 
and the surface charge of the capacitor plates is 
O=£E, =TE&E, = 46), (2.41) 


where the upper and lower signs correspond to the upper and lower plate, respectively. Since o does not 
depend on x and y, we can get the full charges QO; = —Q2 = QO of the surfaces by its multiplication by the 
gap area A, giving us again the already obtained result (28) for the mutual capacitance C= Q/V. I believe 
that this calculation, though very easy, may serve as a good illustration of the boundary problem 
solution approach, which will be used below for more complex cases. 


(ii) Coaxial-cable_ capacitor. Coaxial cable is a system of two round cylindrical, coaxial 
conductors, with the cross-section shown in Fig. 7. 


b>a 


Fig. 2.7. The cross-section of a coaxial cable. 


Evidently, in this case the cylindrical coordinates {p, g, z}, with the z-axis coinciding with the 
common axis of the cylinders, are most convenient.22 Due to the axial symmetry of the problem, in these 
coordinates E(r) = n,£(p), &(r) = &{p), so that in the general expression for the Laplace operator? we 
may take 0/Og = O/dz = 0. As a result, only the radial term of the operator survives, and the boundary 


problem (35) takes the form 
ay a 
alean 0, ga)=V, 4(6)=0. (2.42) 
pdp\ dp 


The sequential double integration of this ordinary linear differential equation is elementary (and similar 
to that of the Poisson equation in spherical coordinates, carried out in Sec. 1.3), giving 


” 


ptac, go)=o [2 


LUG ee eee (2.43) 
dp rok a 


The constants c;,2 may be found using boundary conditions (42): 


22 T am sorry for using for the 2D radius the same letter p as for the volumic density of charge. (Both notations are 
too common to refuse.) I do not believe this may lead to confusion, because the letter will not be used in two 
different meanings during any particular discussion. 

23 See, e.g., MA Eq. (10.3). 
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c, =V, G; mea c, =0, (2.44) 
a 
giving c; =—V/In(6/a), so that Eq. (43) takes the following form: 
In(p/a) 
=Vi1- 2.45 
dp) | ar (2.45) 


Next, for our axial symmetry, the general expression for the gradient of a scalar function is 
reduced to its radial derivative, so that 


_ dflp) Vv 
E(p)= dp  pin(b/a) (2.40) 


This expression, plugged into Eq. (2), allows us to find the density of conductors’ surface charge. For 
example, for the inner electrode 


EV 
2659) — 2.47 
a= Pela) ain(b/a)’ ne 


so that its full charge (per unit length of the system) is 
g 27a0, = emt ; 
In(b/a) 
(It is straightforward to check that the charge of the outer electrode is equal and opposite.) Hence, by 
the definition of the mutual capacitance, its value per unit length is 
C_ Q_ 2%&, 
1 W  In(b/a)’ 


(2.48) 


(2 49) C: Coaxial 


cable 


This expression shows that the total capacitance C is proportional to the systems length / (if / >> 

a, b), while being only logarithmically dependent on is the dimensions of its cross-section. Since the 

logarithm of a large argument is an extremely slow function (sometimes called a quasi-constant), if the 

external conductor is made very large (b >> a), the capacitance diverges, but very weakly. Such 

logarithmic divergence may be cut by any minuscule additional effect, for example by the finite length / 

of the system. This fact yields the following very useful estimate of the self-capacitance of a single 
round wire of radius a: 

_ 2HE yl 
In(//a)’ 


for] >>a. (2.50) 


On the other hand, if the gap d between the conductors is very narrow: d = b — a << a, then 
In(b/a) = In(1 + d/a) may be approximated as d/a, and Eq. (49) is reduced to C ~ 27@al/d, 1.e. to Eq. 
(28) for the plane capacitor, of the appropriate area A = 2 zal. 


(iii) Spherical capacitor. This is a system of two conductors, with the central cross-section 
similar to that of the coaxial cable (Fig. 7), but now with the spherical rather than axial symmetry. This 
symmetry implies that we may be better off using spherical coordinates, so that the potential ¢ depends 
only on one of them: the distance r from the common center of the conductors: &(r) = &{(r). As we 
already know from Sec. 1.3, in this case the general expression for the Laplace operator is reduced to its 
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first (radial) term, so that the Laplace equation takes the simple form (1.47). Moreover, we have already 
found the general solution of this equation — see Eq. (1.50): 


p(r) = 405, (2.51) 


Now acting exactly as above, i.e. determining the (only essential) constant c; from the boundary 
condition @(a) — fb) =V, we get 


=) -l 
c= r(2 - +) ; so that ¢(r)= “(2 — +) oe (2.52) 
a b r\a b 
Next, we can use the spherical symmetry to find the electric field, E(r) = n,£(r), with 
dé V(1 1)" 
E(r)= = . 2.53 
”) dr r (+ *) ( ) 


and hence its values on conductors’ surfaces, and then the surface charge density o from Eq. (3). For 
example, for the inner conductor’s surface, 


i ay 
0, =& F(a) = &y “(4 — *) ; (2.54) 
a\a b 
so that, finally, for the full charge of that conductor we get the following result: 
tay 
Q=4n0°o = A4ne,( 1-2) V. (2.55) 
a 


(Again, the charge of the outer conductor is equal and opposite.) Now we can use the definition (26) of 
the mutual capacitance to get the final result: 


(2.56) 


For b >> a, it coincides with Eq. (17) for the self-capacitance of the inner conductor. On the 
other hand, if the gap d between two conductors is narrow, d= b—a <<a, then 
d) a 


C=4ze, ae = Ane, (2.57) 


i.e. the capacitance approaches that of the planar capacitor of the area A = 42u° — as it should. 


All this seems (and indeed is) very straightforward, but let us contemplate what was the reason 
for such easy successes. In each of the cases (1)-(ii1) we have managed to find such coordinates that both 
the Laplace equation and the boundary conditions involved only one of them. The necessary condition 
for the former fact is for the coordinates to be orthogonal. This means that the three vector components 
of the local differential dr, due to small variations of the new coordinates (say, dr, d@ and d@ for the 
spherical coordinates), are mutually perpendicular. 


2.4. Using other orthogonal coordinates 


The cylindrical and spherical coordinates used above are only the simplest examples of the 
curvilinear orthogonal (or just “orthogonal’”) coordinates, and that approach may be extended to other 
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coordinate systems of this type. As an example, let us calculate the self-capacitance of a thin, round 
conducting disk. The cylindrical or spherical coordinates would not give much help here, because while 
they have the appropriate axial symmetry, they would make the boundary condition on the disk too 
complicated: involving two coordinates, either o and z, or r and @ Help comes from noting that the flat 
disk, i.e. the area with z= 0,7 < R, may be viewed as the limiting case of an axially-symmetric ellipsoid 
(or “degenerate ellipsoid”, or “ellipsoid of rotation”, or “spheroid’’) — the surface formed by rotation of 
the usual ellipse about one of its major axes — which would be also the symmetry axis of the disk — in 
Fig. 8, the z-axis. 


Fig. 2.8. Solving the disk’s capacitance problem. (The 
cross-section of the system by the vertical plane y = 0.) 


sca (2.58) 


where a and b are the so-called major semi-axes, whose ratio determines the ellipse’s eccentricity — the 
degree of its “squeezing”. For our problem, we will only need oblate ellipsoids with a = b; according to 
Eq. (58), they may be represented as surfaces of constant @ in the oblate spheroidal (also called 
“degenerate ellipsoidal”) coordinates {a, f£, g} that are related to the Cartesian coordinates as follows:24 


x = Rcoshasin f cos¢g, 0<a<o, 
y=Rcoshasinfsing, with 0<f<z, (2,59) 
z=Rsinhacos f, O<@<2kz. 


Such spheroidal coordinates are an evident generalization of the spherical coordinates, which 
correspond to the limit @ >> 1 (1.e. 7 >> R). In the opposite limit, the surface of constant a = 0 describes 
our thin disk of radius R, with the coordinate f describing the distance p = (x° + ca a = Rsin£ of its 
point from the z-axis. It is almost evident (and easy to prove) that the curvilinear coordinates (59) are 
also orthogonal; the Laplace operator in them is: 


I. - 50 [ é ) 
—| cosha — 
; 1 cosha 0a 0a 
= x 
R’ (cosh? a — sin” 1 1 1 : 
(cosh* a —sin“ £) ie a ry ele pee 
sin B OB op sin“ 8 cosh’ a /d0g 
Though this expression may look a bit intimidating, let us notice that since in our current 
problem, the boundary conditions depend only on a:?5 


(2.60) 


24 For solution of some problems, it is convenient to use Eqs. (59) with co < a<-+oo and 0 < B< 72. 
25 | have called the disk’s potential V, to distinguish it from the potential ¢ at an arbitrary point of space. 
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Os V P| gan= 9, (2.61) 


there is every reason to assume that the electrostatic potential in all space is a function of @ alone; in 
other words, that all ellipsoids @ = const are the equipotential surfaces. Indeed, acting on such a function 
&{ a) by the Laplace operator (60), we see that the two last terms in the square brackets vanish, and the 
Laplace equation (35) is reduced to a simple ordinary differential equation 


A cosh a =F (2.62) 
da da 
Integrating it twice, just as we did in the three previous problems, we get 
da 
Ha) =c,| (2.63) 
cosha 


This integral may be readily worked out using the substitution € = sinha (which gives dé = cosha da, 
ie. da=décosa, and cosh’a= 1+ sinh’a =1+ €’): 


sinha E 
$(a) =c, | —= +c, =c, tan '(sinha) +c). (2.64) 
a 1S 


The integration constants c; 2 may be simply found from the boundary conditions (61), and we arrive at 
the following final expression for the electrostatic potential: 


P(a) = HI - 2 sinha = alld an"( - : ) : (2.65) 
a 1 sinha 
This solution satisfies both the Laplace equation and the boundary conditions. Mathematicians tell us 
that the solution of any boundary problem of the type (35) is unique, so we do not need to look any 
further. 


Now we may use Eq. (3) to find the surface density of electric charge, but in the case of a thin 
disk, it is more natural to add up such densities on its top and bottom surfaces at the same distance p = 
(x’ + y’)'? from the disk’s center. The densities are evidently equal, due to the problem symmetry about 
the plane z = 0, so that the total density is o= 2€E,|,-:0. According to Eq. (65), and the last of Eqs. (59), 
the electric field on the upper surface is 


0g 0¢(a@) 2 1 2 1 
n|z=10 =~ | 2-40 = : lan =—V - al 06) 
Oz O(R sinh a cos f£) am RceosB a (R’-p’) 
and we see that the charge is distributed over the disk very nonuniformly: 
4 1 
O=—EV (2.67) 


; (R? =p 7 ? 


with a singularity at the disk edge. Below we will see that such singularities are very typical for sharp 
edges of conductors. Fortunately, in our current case the divergence is integrable, giving a finite disk 
charge: 
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t 4 _* 2npdp r dé 
O= |od’p=|o(p)22pdp =—eEV = 4¢.VR|—2— =8e,RV. (2.68) 
J a” ir ae . aE ; 


surface 


Thus, for the disk’s self-capacitance we get a very simple result, 
C=82,R= aes (2.69) 
1 


a factor of 7/2 ~ 1.57 lower than that for the conducting sphere of the same radius, but still complying 
with the general estimate (18). 


Can we always find such a “good” system of orthogonal coordinates? Unfortunately, the answer 
is no, even for highly symmetric geometries. This is why the practical value of this approach is limited, 
and other, more general methods of boundary problem solution are clearly needed. Before proceeding to 
their discussion, however, let me note that in the case of 2D problems (i.e. cylindrical geometries?®), the 
orthogonal coordinate method gets much help from the following conformal mapping approach. 


Let us consider a pair of Cartesian coordinates {x, y} of the cylinder’s cross-section plane as a 
complex variable z = x + iy,27 where i is the imaginary unit (i” =—1), and let az) = u + iv be an analytic 
complex function of z.28 For our current purposes, the most important property of an analytic function is 
that its real and imaginary parts obey the following Cauchy-Riemann relations:*° 


Ou _ Ov Ov Ou 


= 4 = (2.70) 
Ox Oy Ox Oy 
For example, for the function 
w=Z =(x+iy) =(x? — y?)+2ixy, (2.71) 
whose real and imaginary parts are 
u=Rew=x’—y’, v=Imw=2x), (3/2) 


we immediately see that Ou/Ox = 2x = dv/dy, and Ov/dx = 2y = —du/dy, in accordance with Eq. (70). 
Let us differentiate the first of Eqs. (70) over x again, then change the order of differentiation, 
and after that use the latter of those equations: 
Ou Ou O00 _O06N_ O0u_ Ou 
ax’ ax dx axdy Oyo Oy Oy ay’ 


(2.73) 


26 Let me remind the reader that the term cylindrical describes any surface formed by a translation, along a 
straight line, of an arbitrary curve, and hence more general than the usual circular cylinder. (In this terminology, 
for example, a prism is also a cylinder of a particular type, formed by a translation of a polygon.) 

27 The complex variable z should not be confused with the (real) 3™ spatial coordinate z! We are considering 2D 
problems now, with the potential independent of z. 

28 An analytic (or “holomorphic”) function may be defined as the one that may be expanded into the Taylor series 
in its complex argument, 1.e. is infinitely differentiable in the given point. (Almost all “regular” functions, such as 
Zz", 2", exp Z, In z, etc., and their linear combinations are analytic at all z, maybe besides certain special points.) 
If the reader needs to brush up on their background on this subject, I can recommend a popular textbook by M. 
Spiegel et al., Complex Variables, 2™ ed., McGraw-Hill, 2009. 

29 These relations may be used, in particular, to prove the Cauchy integral formula — see, e.g., MA Eq. (15.1). 
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and similarly for v. This means that the sum of second-order partial derivatives of each of the real 
functions u(x, y) and v(x, y) is zero, i.e. that both functions obey the 2D Laplace equation. This 
mathematical fact opens a nice way of solving problems of electrostatics for (relatively simple) 2D 
geometries. Imagine that for a particular boundary problem we have found a function u(z) for that either 
u(x, y) or V(x, y) is constant on all electrode surfaces. Then all lines of constant u (or v) represent 
equipotential surfaces, i.e. the problem of the potential distribution has been essentially solved. 


As a simple example, let us consider a problem important for practice: the quadrupole 
electrostatic lens — a system of four cylindrical electrodes with hyperbolic cross-sections, whose 
boundaries are described by the following relations: 


(2.74) 


ye ‘i a’, for the left and right electrodes, 
x ays 


—a’, for the top and bottom electrodes, 


voltage-biased as shown in Fig. 9a. 


Fig. 2.9. (a) The quadrupole electrostatic lens’ cross-section and (b) its conformal mapping. 


Comparing these relations with Eqs. (72), we see that each electrode surface corresponds to a 
constant value of the real part u(x, y) of the function given by Eq. (71): u = +a’. Moreover, the potentials 
of both surfaces with u = +a’ are equal to +V/2, while those with u =—a’ are equal to —V/2. Hence we 
may conjecture that the electrostatic potential at each point is a function of u alone; moreover, a simple 
linear function, 

b=cute, =¢,(x’ -y’)+c,, (2.75) 


is a valid (and hence the unique) solution of our boundary problem. Indeed, it does satisfy the Laplace 
equation, while the constants c;,. may be readily selected in a way to satisfy all the boundary conditions 
shown in Fig. 9a: 
Vxr--y? 
=— ; 2.76 
p a ae (2.76) 


so that the boundary problem has been solved. 


According to Eq. (76), all equipotential surfaces are hyperbolic cylinders, similar to those of the 
electrode surfaces. What remains is to find the electric field at an arbitrary point inside the system: 


I I 5 sit 27 (2.77) 
Ox a oy a 
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These formulas show, in particular, that if charged particles (e.g., electrons in an electron-optics system) 
are launched to fly ballistically through such a lens, along the z-axis, they experience a force pushing 
them toward the symmetry axis and proportional to the particle’s deviation from the axis (and thus 
equivalent in action to an optical lens with a positive refraction power) in one direction, and a force 
pushing them out (negative refractive power) in the perpendicular direction. One can show that letting 
the particles fly through several such lenses, with alternating voltage polarities, in series, enables beam 
focusing.3° 


Hence, we have reduced the 2D Laplace boundary problem to that of finding the proper analytic 
function uw(z). This task may be also understood as that of finding a conformal map, i.e. a 
correspondence between components of any point pair, {x, y} and {u, v}, residing, respectively, on the 
initial Cartesian plane z and the plane a of the new variables. For example, Eq. (71) maps the real 
electrode configuration onto a plane capacitor of an infinite area (Fig. 9b), and the simplicity of Eq. (75) 
is due to the fact that for the latter system the equipotential surfaces are just parallel planes u = const. 


For more complex geometries, the suitable analytic function w(z) may be hard to find. However, 
for conductors with piece-linear cross-section boundaries, substantial help may be obtained from the 
following Schwarz-Christoffel integral 


(2.78) 


dz 
kya . 


u(z) = const x =a ea 


that provides a conformal mapping of the interior of an arbitrary N-sided polygon on the plane w = u + 
iv, onto the upper-half (vy > 0) of the plane z = x + iy. In Eq. (78), x; = 1, 2, N— 1) are the points of the 
y = 0 axis (i.e., of the boundary of the mapped region on plane z) to which the corresponding polygon 
vertices are mapped, while k; are the exterior angles at the polygon vertices, measured in the units of z, 
with —1 < kj < +1 — see Fig. 10.3! Of the points x;, two may be selected arbitrarily (because their effects 
may be compensated by the multiplicative constant in Eq. (78), and the additive constant of integration), 
while all the others have to be adjusted to provide the correct mapping. 


By 


plane z 


Fig. 2.10. The Schwartz-Christoffel mapping of 
a polygon’s interior onto the upper half-plane. 


30 See, e.g., textbook by P. Grivet, Electron Optics, 2"' ed., Pergamon, 1972. 

3! The integral (78) includes only (N — 1) rather than N poles because a polygon’s shape is fully determined by (NV 
— 1) positions w,; of its vertices and (NV — 1) angles zk;. In particular, since the algebraic sum of all external angles 
of a polygon equals 27, the last angle parameter k; = ky is uniquely determined by the set of the previous ones. 
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In the general case, the complex integral (78) may be hard to tackle. However, in some important 
cases, in particular those with right angles (4; = +2) and/or with some points ws; at infinity, the integrals 
may be readily worked out, giving explicit analytical expressions for the mapping functions u(z). For 
example, let us consider a semi-infinite strip defined by restrictions —1 < u < +1 and 0 < v, on the w- 
plane — see the left panel of Fig. 11. 


plane w plane z 


ue, — [00 


Fig. 2.11. A  semi- 
infinite strip mapped 
onto the upper half- 
_] 0 anf u x, =-l 0 x,=+1 x plane. 


The strip may be considered as a triangle, with one vertex at the infinitely distant vertical point 
w; = 0+ io, Let us map the polygon on the upper half of plane z, shown on the right panel of Fig. 11, 
with the vertex a, =—1 +i0 mapped onto the point z; = —1 + 70, and the vertex w, = +1 + 10 mapped 
onto the point z. = +1 +70. Since the external angles at both these vertices are equal to +7/2, and hence 
ky =ky = +'A, Eq. (78) is reduced to 


dz 
(z + Y'7(z os Ly? 


dz 
(i—2z*)'”? . 


ws(z) = const x f = const x | — de (2.79) 
(z 


Gia pi? = const xd] 


This complex integral may be worked out, just as for real z, with the substitution z = sing, giving 


sin! z 
2.80 
w(z) = const’ x [aé =c,sin'z+c,. C30 


Determining the constants c;,2 from the required mapping, i.e. from the conditions u(-1 + 10) =—-1 +70 
and u(+1+i0)=+1+ 70 (see the arrows in Fig. 11), we finally get32 


Ve ; 
w(z)=—sin'z, ie. z = sin. (2.81a) 
a 2 
Using the well-known expression for the sine of a complex argument,?3 we may rewrite this elegant 
result in either of the following two forms for the real and imaginary components of z and uw: 
/ 2 1 / 2 
i 2% er lox +1)? + y?| + (x-1)? +y?| 


a lot? + yf? 4[ee—? +y?}’ Ft 2 


> 


32 Note that this function differs only by a linear transformation of variables from the function z = ccoshw, which 
is the canonical form of the definition of the so-called e//iptic (not ellipsoidal!) orthogonal coordinates. 
33 See, e.g., MA Eq. (3.5). 
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x=sincosh™, y = cos sinh (2.81b) 
2 2 2 2 


It is amazing how perfectly does the last formula manage to keep y = 0 at the different borders of our w- 
region (Fig. 11): at its side borders (u = +1, 0 < v < 0), this is performed by the first multiplier, while at 
the bottom border (-1 <u < +1, v= 0), the equality is enforced by the second multiplier. 


This mapping may be used to solve several electrostatics problems with the geometry shown in 
Fig. 11a; probably the most surprising of them is the following one. A straight gap of width 2¢ is cut ina 
very thin conducting plane, and voltage V is applied between the resulting half-planes — see the bold 
straight lines in Fig. 12. 


x 


ene | he a Fig. 2.12. The equipotential surfaces of 


ey. SS the electric field between two thin 
conducting semi-planes (or rather their 


cross-sections by the plane z = const). 


Selecting a Cartesian coordinate system with the z-axis directed along the cut, the y-axis normal 
to the plane, and the origin in the middle of the cut (Fig. 12), we can write the boundary conditions of 
this Laplace problem as 


(2.82) 


+V/2, forx > +t, y=0, 
-V/2, forx <-t, y=0. 


(Due to the problem’s symmetry, we may expect that in the middle of the gap, i.e. at-t<x<+tand y= 
0, the electric field is parallel to the plane and hence 0¢/dy = 0.) The comparison of Figs. 11 and 12 
shows that if we normalize our coordinates {x, y} to t, Eqs. (81) provide the conformal mapping of our 
system on the plane z to a plane capacitor on the plane w, with the voltage V between two conducting 
planes located at u = +1. Since we already know that in that case ¢ = (V/2)u, we may immediately use 
the first of Eqs. (81b) to write the final solution of the problem:4 


(2.83) 


1/2 * 


V Or 2x 
g =—u =—sin Fe 
2 a to? +y?| +|(x-9? +? 
The thin lines in Fig. 12 show the corresponding equipotential surfaces;35 it is evident that the 
electric field concentrates at the gap edges, just as it did at the edge of the thin disk (Fig. 8). Let me 


34 This result may be also obtained by the Green’s function method, to be discussed in Sec. 10 below. 

35 Another graphical representation of the electric field distribution, by field lines, is less convenient. (It is more 
useful for the magnetic field, which may be represented by a scalar potential only in particular cases, so there is 
no surprise that the field lines were introduced only by Michael Faraday in the 1830s.) As a reminder, the field 
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leave the remaining calculation of the surface charge distribution and the mutual capacitance between 
the half-planes (per unit length of the system in the z-direction) for the reader’s exercise. 


2.5. Variable separation — Cartesian coordinates 


The general approach of the methods discussed in the last two sections was to satisfy the Laplace 
equation by a function of a single variable that also satisfies the boundary conditions. Unfortunately, in 
many cases this cannot be done — at least, using reasonably simple functions. In this case, a very 
powerful method called the variable separation,*© may work, typically producing “semi-analytical” 
results in the form of series (infinite sums) of either elementary or well-studied special functions. Its 
main idea is to look for the solution of the boundary problem (35) as the sum of partial solutions, 


=> ed, » (2.84) 


where each function ¢ satisfies the Laplace equation, and then select the set of coefficients c;, to satisfy 
the boundary conditions. More specifically, in the variable separation method, the partial solutions g 
are looked for in the form of a product of functions, each depending on just one spatial coordinate. 


Let us discuss this approach on the classical example of a rectangular box with conducting walls 
(Fig. 13), with the same potential (that I will take for zero) at all its sidewalls and the lower lid, but a 
different potential V at the top lid (z = c). Moreover, to demonstrate the power of the variable separation 
method, let us carry out all the calculations for a more general case when the top lid’s potential is an 
arbitrary 2D function V(x, y).37 


Fig. 2.13. The standard playground for the 
variable separation discussion: a rectangular box 
with five conducting, grounded walls and a fixed 
potential distribution V(x, vy) on the top lid. 


For this geometry, it is natural to use the Cartesian coordinates {x, y, z}, representing each of the 
partial solutions in Eq. (84) as the following product 


line is the curve to which the field vectors are tangential at each point. Hence the electric field lines are always 
normal to the equipotential surfaces, so that it is always straightforward to sketch them, if desirable, from the 
equipotential surface pattern — like the one shown in Fig. 12. 

36 This method was already discussed in CM Sec. 6.5 and then used also in Secs. 6.6 and 8.4 of that course. 
However, it is so important that I need to repeat its discussion in this part of my series, for the benefit of the 
readers who have skipped the Classical Mechanics course for whatever reason. 

37 Such voltage distributions may be implemented in practice, for example, using the so-called mosaic electrodes 
consisting of many electrically-insulated and individually-biased panels. 
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b, = X(x)¥(y)Z(z). (2.85) 
Plugging it into the Laplace equation expressed in the Cartesian coordinates, 
2 2 2 
Oe Oe OMe Lg, (2.86) 
Ox Oy Oz 


and dividing the result by XYZ, we get 
Pax. IsdeY Vd Z 


+ + =0. 2.87 
Xd? Ydy Z dz’ oe 


Here comes the punch line of the variable separation method: since the first term of this sum may 
depend only on x, the second one only of y, etc., Eq. (87) may be satisfied everywhere in the volume 
only if each of these terms equals a constant. In a minute we will see that for our current problem (Fig. 
13), these constant x- and y-terms have to be negative; hence let us denote these variable separation 
constants as (-a) and (-£°), respectively. Now Eq. (87) shows that the constant z-term has to be 
positive; denoting it as 7 we get the following relation: 


e+f=yr’. (2.88) 
Now the variables are separated in the sense that for the functions X(x), Y(y), and Z(z) we have got 
separate ordinary differential equations, 
d°*X 


2 


aes 


dz’ 


d°y 


2 


+a°X =0, + BY =0, _y"Z =0, (2.89) 


dx 
which are related only by Eq. (88) for their constant parameters. 


Let us start from the equation for X(x). Its general solution is the sum of functions sinax and 
cosax, multiplied by arbitrary coefficients. Let us select these coefficients to satisfy our boundary 
conditions. First, since ¢ oc X should vanish at the back vertical wall of the box (i.e., with the coordinate 
origin choice shown in Fig. 13, at x = 0 for any y and z), the coefficient at cosax should be zero. The 
remaining coefficient (at sinax) may be included in the general factor c; in Eq. (84), so that we may take 
X in the form 

X =sinax. (2.90) 


This solution satisfies the boundary condition at the opposite wall (x = a) only if the product aa is a 
multiple of 7, i.e. if @is equal to any of the following numbers (commonly called eigenvalues):38 


a,=n, with n=1,2.... (2.91) 
a 

(Terms with negative values of n would not be linearly-independent from those with positive n, and may 

be dropped from the sum (84). The value n = 0 is formally possible, but would give X = 0, i.e. & = 0, at 


38 Note that according to Eqs. (91)-(92), as the spatial dimensions a and b of the system are increased, the 
distances between the adjacent eigenvalues tend to zero. This fact implies that for spatially-infinite systems, the 
eigenvalue spectra are continuous, so that the sums of the type (84) become integrals; however, the general 
approach remains the same. A few problems of this type are provided in Sec. 9 for the reader’s exercise. 
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any x, i.e. no contribution to sum (84), so it may be dropped as well.) Now we see that we indeed had to 
take @ real, ie. & positive — otherwise, instead of the oscillating function (90) we would have a sum of 
two exponential functions, which cannot equal zero at two independent points of the x-axis. 


Since the equation (89) for function Y(y) is similar to that for X(x), and the boundary conditions 
on the walls perpendicular to axis y (vy = 0 and y = 5) are similar to those for x-walls, the absolutely 
similar reasoning gives 

Y=sinfy, B, = _m with m=1,2,..., (2.92) 
where the integer m may be selected independently of n. Now we see that according to Eq. (88), the 
separation constant vy depends on two integers and m, so that the relationship may be rewritten as 


¥, =|? + p2]" = “(2 (2) (2.93) 


The corresponding solution of the differential equation for Z may be represented as a linear 
combination of two exponents exp{£%mmz}, or alternatively of two hyperbolic functions, sinhy%.,z and 
coshY%mz, With arbitrary coefficients. At our choice of coordinate origin, the latter option is preferable 
because coshy,,,z cannot satisfy the zero boundary condition at the bottom lid of the box (z = 0). Hence 
we may take Z in the form 


Z =SINDY pn Zs (2.94) 


which automatically satisfies that condition. 


Now it is the right time to merge Eqs. (84)-(85) and (90)-(94), replacing the temporary index k 
with the full set of possible eigenvalues, in our current case of two integer indices n and m: 


<sin sinh VonZ (2.95) 
a 


Q(X, y,Z) = oe sin 


n,m=1 


where %m is given by Eq. (93). This solution satisfies not only the Laplace equation but also the 
boundary conditions on all walls of the box, besides the top lid, for arbitrary coefficients c,,,. The only 
job left is to choose these coefficients from the top-lid requirement: 

7m 


dxy,.0)= Ye, sin sin “sinh y,,¢ =V (x,y). (2.96) 
a 


n,m=1 


It may look bad to have just one equation for the infinite set of coefficients Cy. However, the decisive 
help comes from the fact that the functions of x and y that participate in Eq. (96), form full, orthogonal 
sets of 1D functions. The last term means that the integrals of the products of the functions with 
different integer indices over the region of interest equal zero. Indeed, a direct integration gives 

t... mmx. m'x a 
[sin sin — dx =—d 


; 2.97 
’ a a 2 nn ( ) 


where 6,» is the Kronecker symbol, and similarly for y (with the evident replacements a —> b, and n > 
m). Hence, a fruitful way to proceed is to multiply both sides of Eq. (96) by the product of the basis 
functions, with arbitrary indices n’ and m’, and integrate the result over x and y: 
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= a p b I a b ' t 
yom sinh y,,,,c [sin sin ax [sin = sin ae: dy = [axf dy V(x, y)sin sin” (2.98) 
nym= 0 a a 0 b b 0 0 a b 

Due to Eq. (97), all terms on the left-hand side of the last equation, besides those with n = n’ and m = 
m’, vanish, and (replacing n’ with n, and m’ with m, for notation brevity) we finally get 


b 

fave, y)sin sin == (2.99) 
a 

0 


C7 ener | dx 

ab sinh 7 n€ 4 
The relations (93), (95), and (99) give the complete solution of the posed boundary problem; we 
can see both good and bad news here. The first bit of bad news is that in the general case we still need to 
work out the integrals (99) — formally, the infinite number of them. In some cases, it is possible to do 
this analytically, in one shot. For example, if the top lid in our problem is a single conductor, i.e. has a 
constant potential Vo, we may take V(x,y) = Vo = const, and both 1D integrations are elementary; for 


example 
a ™m 2,  forn odd, 
[sin ax = + [singag = +x (2.100) 
0 a ™M an 0, forn even, 


and similarly for the integral over y, so that 


bass 16V, ; ( if both n and mare odd, (2.101) 


mnmsinhy,,,c |0, otherwise. 


The second bad news is that even on such a happy occasion, we still have to sum up the series (95), so 
that our result may only be called analytical with some reservations because in most cases we need to 
perform numerical summation to get the final numbers or plots. 


Now the first good news. Computers are very efficient for both operations (95) and (99), i.e. for 
the summation and integration. (As was discussed in Sec. 1.2, random errors are averaged out at these 
operations.) As an example, Fig. 14 shows the plots of the electrostatic potential in a cubic box (a = b = 
c), with an equipotential top lid (V = Vo = const), obtained by a numerical summation of the series (95), 
using the analytical expression (101). The remarkable feature of this calculation is a very fast 
convergence of the series; for the middle cross-section of the cubic box (z/c = 0.5), already the first term 
(with n = m = 1) gives an accuracy of about 6%, while the sum of four leading terms (with n, m = 1, 3) 
reduces the error to just 0.2%. (For a longer box, c > a, b, the convergence is even faster — see the 
discussion below.) Only very close to the corners between the top lid and the sidewalls, where the 
potential changes rapidly, several more terms are necessary to get a reasonable accuracy. 


The related piece of good news is that our “semi-analytical” result allows its ultimate limits to be 
explored analytically. For example, Eq. (93) shows that for a very flat box (with c <<a, b), YamZ S YamC 
<< | at least for the lowest terms of series (95), with n, m << c/a, c/b. In this case, the sinh functions in 
Eqs. (96) and (99) may be well approximated with their arguments, and their ratio by z/c. So if we limit 
the summation to these terms, Eq. (95) gives a very simple result 


P(X, y) © <V (x9), (2.102) 


which means that each elementary segment of the flat box behaves just as a plane capacitor. Only near 
the sidewalls, the higher terms in the series (95) are important, producing some deviations from Eq. 
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(102). (For the general problem with an arbitrary function V(x,y), this is also true in all regions where 
this function changes sharply.) 


y=b/2 
0.8 
0.6 0.6 
P(X, V,Z) 
V 
0.4 0.4 
—=(), 
a 0. 0. 
0.2 05 
0 
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 
x/a zlc 


Fig. 2.14. The electrostatic potential’s distribution inside a cubic box (a = 6 = c) with a constant voltage Vo 
on the top lid (Fig. 13), calculated numerically from Eqs. (93), (95), and (101). The dashed line on the left 
panel shows the contribution of the main term of the series (with n = m = 1) to the full result, for z/c = 0.5. 


In the opposite limit (a, b << c), Eq. (93) shows that on the contrary, %.c >> 1 for all n and m. 
Moreover, the ratio sinhy, ,z/sinhy,,,c drops sharply if either n or m is increased, provided that z is not 
too close to c. Hence in this case a very good approximation may be obtained by keeping just the 
leading term, with n = m = 1, in Eq. (95), so that the challenge of summation disappears. (As was 
discussed above, this approximation works reasonably well even for a cubic box.) In particular, for the 
constant potential of the upper lid, we can use Eq. (101) and the exponential asymptotic for both sinh 
functions, to get a very simple formula: 


> 1 9\/2 
a= $n in Pero 7l +6’) (c a}. (2.103) 
[a a 


ab 


These results may be readily generalized to some other problems. For example, if all walls of the 
box shown in Fig. 13 have an arbitrary potential distribution, we may use the linear superposition 
principle to represent the electrostatic potential distribution as the sum of six partial solutions of the type 
of Eq. (95), each with one wall biased by the corresponding voltage, and all other grounded (¢= 0). 


To summarize, the results given by the variable separation method in the Cartesian coordinates 
are closer to what we could call a genuinely analytical solution than to a purely numerical solution. 
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Now, let us explore the issues that arise when this method is applied in other orthogonal coordinate 
systems. 


2.6. Variable separation — polar coordinates 


If a system of conductors is cylindrical, the potential distribution is independent of the z- 
coordinate along the cylinder axis: 0¢/0z =0, and the Laplace equation becomes two-dimensional. If the 
conductor’s cross-section is rectangular, the variable separation method works best in Cartesian 
coordinates {x, y}, and is just a particular case of the 3D solution discussed above. However, if the 
cross-section is circular, much more compact results may be obtained by using the polar coordinates {p, 
gy}. As we already know from Sec. 3(ii), these 2D coordinates are orthogonal, so that the two- 
dimensional Laplace operator is a sum of two separable terms.*® Requiring, just as we have done above, 
each component of the sum (84) to satisfy the Laplace equation, we get 


1ef( 0¢, ie 1 0d, 
pop dp) p° ay 


=0. (2.104) 


In a full analogy with Eq. (85), let us represent each particular solution ¢ as a product €(p)H¢Q). 
Plugging this expression into Eq. (104) and then dividing all its parts by &Z/p", we get 


2 
P (oS | 52% -0, (2.105) 
Rdp\ dp) #do 
Following the same reasoning as for the Cartesian coordinates, we get two separated ordinary 
differential equations 
d dk 2 
—| p— |=Vv®, 2.106 
Pas [> | (2.106) 
2 
us fy #=0, (2.107) 
dp 


where Vv is the variable separation constant. 


Let us start their analysis from Eq. (106), plugging into it a probe solution € = cp* where c and 
aare some constants. The elementary differentiation shows that if a@ 4 0, the equation is indeed satisfied 
for any c, with just one requirement imposed on the constant @, namely of = Vv. This means that the 
following linear superposition 

K=a,p' +b", forv #0, (2.108) 


with any constant coefficients a,and by, is also a solution of Eq. (106). Moreover, the general theory of 
linear ordinary differential equations tells us that the solution of a second-order equation like Eq. (106) 
may only depend on just two constant factors that scale two linearly-independent functions. Hence, for 
all values ¥ # 0, Eq. (108) presents the general solution of that equation. The case when v= 0, in which 
the functions p “” and p " are just constants and hence are not linearly independent, is special, but in 
this case, the integration of Eq. (106) is straightforward,*° giving 


39 See, e.g., MA Eq. (10.3) with 0/dz = 0. 
40 Actually, we have already performed it in Sec. 3 — see Eq. (43). 
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kK =a,+b,Inp, forv =0. (2.109) 


In order to specify the separation constant, let us explore Eq. (107), whose general solution is 


: (. cosvyt+s,sinve, forv #0, (2.110) 


Cot SQ, forv = 0. 


There are two possible cases here. In many boundary problems solvable in cylindrical coordinates, the 
free-space region, in which the Laplace equation is valid, extends continuously around the origin point p 
= 0. In this region, the potential has to be continuous and uniquely defined, so that 7 has to be a 27% 
periodic function of g. For that, one needs the product Kg +27) to equal ve + 27m, with n being an 
integer, immediately giving us a discrete spectrum of possible values of the variable separation constant: 


v=n=0,+1,+2.... (2.111) 


In this case, both functions and 7 may be labeled with the integer index n. Taking into account that 
the terms with negative values of n may be summed up with those with positive n, and that so has to 
equal zero (otherwise the 27-periodicity of function 7 would be violated), we see that the general 
solution of the 2D Laplace equation for such geometries may be represented as 


Variable 
; el bh ; 
a $( P,P) = a) +b, ine [ao Je, cosn@+s, sin ng). (2.112) 
p 


coordinates n=l 


Let us see how all this machinery works on the famous problem of a round cylindrical conductor 
placed into an electric field that is uniform and perpendicular to the cylinder’s axis at large distances 
(see Fig. 15a), as if it is created by a large plane capacitor. First of all, let us explore the effect of the 
system’s symmetries on the coefficients in Eq. (112). Selecting the coordinate system as shown in Fig. 
15a, and taking the cylinder’s potential for zero, we immediately get ao = 0. 


(a) (b) 
=, y _ | 
J» ; 
——% Si 
xX 
—» —» 


Fig. 2.15. A conducting cylinder inserted into an initially uniform electric field perpendicular to its 
axis: (a) the problem’s geometry, and (b) the equipotential surfaces given by Eq. (117). 
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Moreover, due to the mirror symmetry about the plane [x, z], the solution has to be an even 
function of the angle g, and hence all coefficients s,, should also equal zero. Also, at large distances (p 
>> R) from the cylinder, its effect on the electric field should vanish, and the potential should approach 
that of the uniform external field E = Eon,: 


@—>-E,x=-E,pcosg, forp>o. (2.113) 


This is only possible if in Eq. (112), bo = 0, and also all coefficients a, with n 4 1 vanish, while the 
product a,c; should be equal to (—E). Thus the solution is reduced to the following form 


~ B 
(2,9) =-E,pcosp+ > —cosng, (2.114) 
n=l 


in which the coefficients B, = b,c, should be found from the boundary condition at p= R: 
0(R, gp) =0. (2.115) 


This requirement yields the following equation, 


B “B 
[- BR +5 Joosp+ 3% cosng =0, (2.116) 
n=2 


which should be satisfied for all g. This equality, read backward, may be considered as an expansion of 
a function identically equal to zero into a series over mutually orthogonal functions cos ng. It is 
evidently valid if all coefficients of the expansion, including (-EoR + B,/R), and all B, for n = 2 are 
equal to zero. Moreover, mathematics tells us that such expansions are unique, so this is the only 
possible solution of Eq. (116). So, B; = EoR’, and our final answer (valid only outside of the cylinder, 
i.e. for p= R), is 


R? R? 
$(p,9) =—E,| p-— |cosy = —E,| 1-— |x. (2.117) 
p ey 


This result, which may be graphically represented with the equipotential surfaces shown in Fig. 
15b, shows a smooth transition between the uniform field (113) far from the cylinder, to the 
equipotential surface of the cylinder (with ¢ = 0). Such smoothening is very typical for Laplace 
equation solutions. Indeed, as we know from Chapter 1, these solutions correspond to the lowest integral 
of the potential gradient’s square, i.e. to the lowest potential energy (1.65) possible at the given 
boundary conditions. 


To complete the problem, let us use Eq. (3) to calculate the distribution of the surface charge 
density over the cylinder’s cross-section: 


r,) 2 
T= E,F | entice = £0 ae = &)F, cos pee p- = = 26,E, cos@. (2.118) 
0p of) ae 


ye) 


This very simple formula shows that with the field direction shown in Fig. 15a (Eo > 0), the surface 
charge is positive on the right-hand side of the cylinder and negative on its left-hand side, thus creating a 
field directed from the right to the left, which exactly compensates the external field inside the 
conductor, where the net field is zero. (Please take one more look at the schematic Fig. la.) Note also 
that the net electric charge of the cylinder is zero, in correspondence with the problem symmetry. 
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Another useful by-product of the calculation (118) is that the surface electric field equals 2Eocosg, and 
hence its largest magnitude is twice the field far from the cylinder. Such electric field concentration is 
very typical for all convex conducting surfaces. 


The last observation gets additional confirmation from the second possible topology, when Eq. 
(110) is used to describe problems with no angular periodicity. A typical example of this situation is a 
cylindrical conductor with a cross-section that features a corner limited by two straight-lines segments 
(Fig. 16). Indeed, we may argue that at o < R (where R is the radial extension of the planar sides of the 
comer, see Fig. 16), the Laplace equation may be satisfied by a sum of partial solutions €(p)H¢), if the 
angular components of the products satisfy the boundary conditions on the corner sides. Taking (just for 
the simplicity of notation) the conductor’s potential to be zero, and one of the corner’s sides as the x- 
axis (g = 0), these boundary conditions are 


7(0)=7(f) =0, (2.119) 


where the angle # may be anywhere between 0 and 27— see Fig. 16. 


(a) (b) 


Fig. 2.16. The cross-sections 
0 ae of cylindrical conductors with 
pK (a) a corner and (b) a wedge. 


Comparing this condition with Eq. (110), we see that it requires s and all c,, to vanish, and v to 
take one of the values of the following discrete spectrum: 


V,, 2 = am, with m =1,2..... (2.120) 


Hence the full solution of the Laplace equation for this geometry takes the form 
o=Yia,p7? sin foro<R, 0<Q9<f, (2.121) 
m=1 


where the constants s, have been incorporated into a,,. The set of coefficients a,, cannot be universally 
determined, because it depends on the exact shape of the conductor outside the corner, and the 
externally applied electric field. However, whatever the set is, in the limit o — 0, the solution (121) is 
almost*! always dominated by the term with the lowest m = 1: 


> a,p7'? sin 50. (2.122) 


because the higher terms go to zero faster. This potential distribution corresponds to the surface charge 
density 


41 Exceptions are possible only for highly symmetric configurations when the external field is specially crafted to 
make a = 0. In this case, the solution at p > 0 is dominated by the first nonzero term of the series (121). 
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0g tee OE (x/B-1) 2.123 
d(pQ) ee se 


o= E,E, surface — —€o 


p=const, p>+0 - B 


(It is similar, with the opposite sign, on the opposite face of the angle.) 


The result (123) shows that if we are dealing with a concave corner (f < z, see Fig. 16a), the 
charge density (and the surface electric field) tends to zero. On the other case, at a “convex corner” with 
B > x (actually, a wedge — see Fig. 16b), both the charge and the field’s strength concentrate, formally 
diverging at op > 0. (So, do not sit on a roof’s ridge during a thunderstorm; rather hide in a ditch!) We 
have already seen qualitatively similar effects for the thin round disk and the split plane. 


2.7. Variable separation — cylindrical coordinates 


Now, let us discuss how to generalize the approach discussed in the previous section to problems 
whose geometry is still axially-symmetric, but where the electrostatic potential depends not only on the 
radial and angular coordinates but also on the axial coordinate: 0¢/0z # 0. The classical example of such 
a problem is shown in Fig. 17. Here the sidewall and the bottom lid of a hollow round cylinder are kept 
at a fixed potential (say, ¢@ = 0), but the potential V fixed at the top lid is different. Evidently, this 
problem is qualitatively similar to the rectangular box problem solved above (Fig. 13), and we will also 
try to solve it first for the case of arbitrary voltage distribution over the top lid: V= V(p, ¢). 


Fig. 2.17. A cylindrical volume 
with conducting walls. 


Following the main idea of the variable separation method, let us require that each partial 
function ¢ in Eq. (84) satisfies the Laplace equation, now in the full cylindrical coordinates {p, g, z}:*2 


2 2 
ae pet + - a As ge tr =U: (2.124) 
pop\ op) p° oY Oz 


Plugging ¢ in the form of the product <(p)A @g)Z(z) into Eq. (124) and then dividing all resulting terms 
by this product, we get 


2 2 
al #). et ig Ee 2G (2.125) 


peRdp\ dp) p#dg@ Z dz’ 
Since the first two terms of Eq. (125) can only depend on the polar variables p and g, while the third 


term, only on z, at least that term should equal a constant. Denoting it (just like we did in the 
rectangular box problem) by 7’, we get the following set of two equations: 


42 See, e.g., MA Eq. (10.3). 
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az 


Ee =y¥°Z, (2.126) 
2 

spied ON gee, ee (2.127) 
pkdp\ dp ptdp 


Now, multiplying all the terms of Eq. (127) by p’, we see that the last term of the result, (Ady V/Z, 


may depend only on g, and thus should equal a constant. Calling that constant / (just as in Sec. 6 
above), we separate Eq. (127) into an angular equation, 


2 
dl fy A=, (2.128) 
? 
and a radial equation: 
2 2 
é . Lek ge “eR =0. (2.129) 
dp’ pdp p 


We see that the ordinary differential equations for the functions Z(z) and (q@) (and hence their 
solutions) are identical to those discussed earlier in this chapter. However, Eq. (129) for the radial 
function &(p) (called the Bessel equation) is more complex than in the 2D case and depends on two 
independent constant parameters, y and v. The latter challenge may be readily overcome if we notice 
that any change of y may be reduced to the corresponding re-scaling of the radial coordinate p. Indeed, 
introducing a dimensionless variable € = yo,*3 Eq. (129) may be reduced to an equation with just one 
parameter, v. 


(2.130) 


Moreover, we already know that for angle-periodic problems, the spectrum of eigenvalues of Eq. (128) 
is discrete: v= n, with integer n. 


Unfortunately, even in this case, Eq. (130), which is the canonical form of the Bessel equation, 
cannot be satisfied by a single “elementary” function. Its solutions that we need for our current problem 
are called the Bessel function of the first kind of order v, commonly denoted as J,(é). Let me review in 
brief those properties of these functions that are most relevant for our problem — and many other 
problems discussed in this series. 


First of all, the Bessel function of a negative integer order is very simply related to that with the 
positive order: 


J he Je=(-1) (6) (2.131) 


enabling us to limit our discussion to the functions with n > 0. Figure 18 shows four of these functions 
with the lowest positive n. 


43 Note that this normalization is specific for each value of the variable separation parameter y. Also, please notice 
that the normalization is meaningless for y = 0, 1.e. for the case Z(z) = const. However, if we need the partial 
solutions for this particular value of y, we can always use Eqs. (108)-(109). 

44 For a more complete discussion of these functions, see the literature listed in MA Sec. 16, for example, Chapter 
6 (written by F. Olver) in the famous collection compiled and edited by Abramowitz and Stegun, available online. 
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Fig. 2.18. Several Bessel functions 
JA€) of integer order. The dashed 
lines show the envelope of the 
6 5 ; 1 >9 | asymptotes (135). 


As its argument is increased, each function is initially close to a power law: Jo(é) = 1, Ji(€) = 
{2, JX) ~ &18, etc. This behavior follows from the Taylor series 


_(€/S_Cot (éy" 
1,6) -(2) yale (2.132) 


oo 
k=0 


which is formally valid for any ¢, and may even serve as an alternative definition of the functions J,(é). 
However, the series is converging fast only at small arguments, €<<n, where its leading term is 


1 E n 
.; yee ee 2A33 
At =n + 1.86n"’, the Bessel function reaches its maximum*® 
0.675 
max .[J, (€)] = a (2.134) 


and then starts to oscillate with a period gradually approaching 27, a phase shift that increases by 7/2 
with each unit increment of n, and an amplitude that decreases as €"”. All these features are described 
by the following asymptotic formula: 


9) 1/2 “ Ain 
TQ) 50 > [2 co & -4-42) (2.135) 


which starts to give a reasonable approximation soon after the function peaks — see Fig. 18.46 


45 These two approximations for the Bessel function peak are strictly valid for n >> 1, but may be used for 
reasonable estimates starting already from n = 1. For example, max-[J;(¢)] is close to 0.58 and is reached at € ~ 
2.4, just about 30% away from the values given by the asymptotic formulas. 
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Now we are ready for our case study (Fig. 17). Since the functions the Z(z) have to satisfy not 
only Eq. (126) but also the bottom-lid boundary condition Z(0) = 0, they are proportional to sinhyz — cf. 
Eq. (94). Then Eq. (84) becomes 


p= ys (yo) (c,, cosnp+s,,, sin ng) sinhyz. (2.136) 
n=0 7 
Next, we need to satisfy the zero boundary condition at the cylinder’s side wall (9 = R). This may be 
ensured by taking 
J,,(yR) =0. (2.137) 


Since each function J,(x) has an infinite number of positive zeros (see Fig. 18 again), which may be 
numbered by an integer index m = 1, 2, ..., Eq. (137) may be satisfied with an infinite number of 
discrete values of the parameter 7 £ 
=" 2.138 
Yam R ( ) 


where én is the m-th zero of the function J/,(x) — see the top numbers in the cells of Table 1. (Very soon 
we will see what we need the bottom numbers for.) 


Table 2.1. Approximate values of a few first zeros, &,, of a few lowest-order Bessel functions J,(€) 
(the top number in each cell), and the values of d/,(6)/dé at these points (the bottom number). 


m= 1 2 3 4 5 6 

ae 2.40482 5.52008 8.65372 11.79215 14.93091 18.07106 
-0.51914 +0.34026 -0.27145 +0.23245 -0.20654 +0.18773 

1 3.83171 7.01559 10.17347 13.32369 16.47063 19.61586 
-0.40276 +0.30012 -0.24970 +0.21836 -0.19647 +0.18006 

> 5.13562 8.41724 11.61984 14.79595 17.95982 21.11700 
-0.33967 +0.27138 -0.23244 +0.20654 -0.18773 +0.17326 

3 6.38016 9.76102 13.01520 16.22347 19.40942 22.58273 
-0.29827 +0.24942 -0.21828 +0.19644 -0.18005 +0.16718 

4 7.58834 11.06471 14.37254 17.61597 20.82693 24.01902 
-0.26836 +0.23188 -0.20636 +0.18766 -0.17323 +0.16168 

5 8.77148 12.33860 15.70017 18.98013 22.21780 25.43034 
-0.24543 +0.21743 -0.19615 +0.17993 -0.16712 +0.15669 


Hence, Eq. (136) may be represented in a more explicit form: 


Pp, P, z) = y y J, En Ao COSn@ 5 Siam sin nosint| Em <) . (2. 139) 


n=0 m=1 


46 Eq. (135) and Fig. 18 clearly show the close analogy between the Bessel functions and the usual trigonometric 
functions, sine and cosine. To emphasize this similarity, and help the reader to develop more gut feeling of the 
Bessel functions, let me mention one result of the elasticity theory: while the sinusoidal functions describe, in 
particular, transverse standing waves on a guitar string, the functions J,(é) describe, in particular, transverse 
standing waves on an elastic round membrane (say, a round drum), with Jo(€) describing their lowest 
(fundamental) mode — the only mode with a nonzero amplitude of the membrane center’s oscillations. 
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Here the coefficients Cm and Sym have to be selected to satisfy the only remaining boundary condition — 
that on the top lid: 


2(p,9,1) = yy, G 2 }( Crm COSNO +S, , sinng) sinh -) =V(p,¢). (2.140) 
n=0 m=l 

To use it, let us multiply both sides of Eq. (140) by the product J,(E:n:o/R) cos n’@g, integrate the result 

over the lid area, and use the following property of the Bessel functions: 


1 


(ACR TA ae s)sds = Vyas Ean IT Sn (2.141) 


0 


As a small but important detour, the last relation expresses a very specific (“2D”) orthogonality 
of the Bessel functions with different indices m — do not confuse them with the function orders n, 
please!*’ Since it relates two Bessel functions of the same order 7, it is natural to ask why its right-hand 
side contains the function with a different order (7 + 1). Some gut feeling of that may come from one 
more very important property of the Bessel functions, the so-called recurrence relations:*® 


JS) t+ In (S) = ae (2.142a) 
_ _ 4 J, (5) 
J,1(S) Jna(G) = 2 ; (2.142b) 


which in particular yield the following formula (convenient for working out some Bessel function 
integrals): 


ae ee (2.143) 


Let us apply the recurrence relations at the special points ¢,,,. At these points, J, vanishes, and the 
system of two equations (142) may be readily solved to get, in particular, 


dJ 
As =——* ; 2.144 
n+l (e. ) dé (Gz ) ( ) 
so that the square bracket on the right-hand side of Eq. (141) is just (dJ,/dé)° at €= Em. Thus the values 
of the Bessel function derivatives at the zero points of the function, given by the lower numbers in the 
cells of Table 1, are as important for boundary problem solutions as the zeros themselves. 


Now returning to our problem: since the angular functions cos ng are also orthogonal — both to 
each other, 


47 The Bessel functions of the same argument but different orders are also orthogonal, but differently: 


J Jn OE = Pow 

&€ nt+n 
48 These relations provide, in particular, a convenient way for numerical computation of all J,(&) — after Jo(&) has 
been computed. (The latter task is usually performed using Eq. (132) for smaller € and an extension of Eq. (135) 
for larger ¢.) Note that most mathematical software packages, including all those listed in MA Sec. 16(iv), include 
ready subroutines for calculation of the functions J,(é) and other special functions used in this lecture series. In 
this sense, the conditional line separating these “special functions” from “elementary functions” is rather fine. 
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2a 


[cos(ng) cos(n'—p) dp =0,,,,, (2.145) 
0 


and to all functions sin ng, the integration over the lid area kills all terms of both series in Eq. (140), 
besides just one term proportional to Cm, and hence gives an explicit expression for that coefficient. 
The counterpart coefficients s,,,, may be found by repeating the same procedure with the replacement of 
cos n’gby sin n’g. This evaluation (left for the reader’s exercise) completes the solution of our problem 
for an arbitrary lid potential V(p,¢). 


Still, before leaving the Bessel functions behind (for a while only :-), let me address two 
important issues. First, we have seen that in our cylinder problem (Fig. 17), the set of functions 
Ji(Enmp/R) with different indices m (which characterize the degree of Bessel function’s stretch along 
axis 9) play a role similar to that of functions sin(znx/a) in the rectangular box problem shown in Fig. 
13. In this context, what is the analog of functions cos(amx/a) — which may be important for some 
boundary problems? In a more formal language, are there any functions of the same argument € = 
Enmp/R, that would be linearly independent of the Bessel functions of the first kind, while satisfying the 
same Bessel equation (130)? 


The answer is yes. For the definition of such functions, we first need to generalize our prior 
formulas for J,(é), and in particular Eq. (132), to the case of arbitrary, not necessarily real order v. 
Mathematics says that the generalization may be performed in the following way: 


= Cyt (é)" 
J,(6)= (é jt? ) (2.146) 


where I'(s) is the so-called gamma function that may be defined as*? 
T(s) =| Ete ode | (2.147) 
0 


The simplest, and the most important property of the gamma function is that for integer values of its 
argument, it gives the factorial of the number smaller by one: 


Tint) =nt=1-2-..-n, (2.148) 
so it is essentially a generalization of the notion of the factorial to all real numbers. 


The Bessel functions defined by Eq. (146) satisfy, after the replacements n > vand n! > I(n + 
1), virtually all the relations discussed above, including the Bessel equation (130), the asymptotic 
formula (135), the orthogonality condition (141), and the recurrence relations (142). Moreover, it may 
be shown that v #n, functions J/(é) and J.(é) are linearly independent of each other, and hence their 
linear combination may be used to represent the general solution of the Bessel equation. Unfortunately, 
as Eq. (131) shows, for v = n this is not true, and a solution linearly independent of J,(é) has to be 
formed differently. The most common way to do that is first to define, for all v 4, the following 
functions: 
J, (€)cosva —J_, (s) 


sin vz 


¥,(¢)= 


(2.149) 
49 See, e.g., MA Eq. (6.7a). Note that '(s) > 0 at s > 0, -1, -2,... 
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called the Bessel functions of the second kind, or more often the Weber functions,>°° and then to follow 
the limit v— n. At this, both the numerator and denominator of the right-hand side of Eq. (149) tend to 
zero, but their ratio tends to a finite value called Y,,(x). It may be shown that the resulting functions are 
still the solutions of the Bessel equation and are linearly independent of J,,(x), though are related just as 
those functions if the sign of n changes: 


Y_,(¢)=CD"Y,@). (2.150) 
Figure 19 shows a few Weber functions of the lowest integer orders. 


1 


0.5 


-0.5 
Fig. 2.19. A few Bessel 
functions of the second kind 
(a.k.a. the Weber functions, 
a.k.a. the Neumann functions). 


The plots show that their asymptotic behavior is very much similar to that of the functions J,( € ): 


aT NT 


Y(¢)> (2. sin -—- “=| for FE > «, (2.151) 
mé 4 


but with the phase shift necessary to make these Bessel functions orthogonal to those of the fist order — 
cf. Eq. (135). However, for small values of argument ¢, the Bessel functions of the second kind behave 
completely differently from those of the first kind: 


2(mE+), forn =0, 

a 

TO ay | e nopeo (2.152) 
va ) b) kJ 


where yis the so-called Euler constant, defined as follows: 


50 Sometimes, they are called the Neumann functions and denoted as N(é). 
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y =lim,, ,., [is : + : Ped ; In n| = 0.577157... (2:153) 
n 


As Eqs. (152) and Fig. 19 show, the functions Y,(¢) diverge at € — 0 and hence cannot describe the 
behavior of any physical variable, in particular the electrostatic potential. 
One may wonder: if this is true, when do we need these functions in physics? Figure 20 shows an 


example of a simple boundary problem of electrostatics, whose solution by the variable separation 
method involves both functions J,( €) and Y,,( é ). 


(b) 


Fig. 2.20. A simple boundary 
problem that cannot be solved 
using just one kind of Bessel 
functions. 


Here two round, conducting coaxial cylindrical tubes are kept at the same (say, zero) potential, 
but at least one of two lids has a different potential. The problem is almost completely similar to that 
discussed above (Fig. 17), but now we need to find the potential distribution in the free space between 
the tubes, i.e. for R; < ~ < Ro. If we use the same variable separation as in the simpler counterpart 
problem, we need the radial functions €(p) to satisfy two zero boundary conditions: at p = R; and p = 
R>. With the Bessel functions of just the first kind, J,(yp), it is impossible to do, because the two 
boundaries would impose two independent (and generally incompatible) conditions, J,(yR,) =0, and 
J,(yR2) = 0, on one “stretching parameter” vy. The existence of the Bessel functions of the second kind 
immediately saves the day, because if the radial function solution is represented as a linear combination, 


R=c,J,(¥p)+cyY, (vp), (2.154) 


two zero boundary conditions give two equations for vy and the ratio c = cy/c,;.>! (Due to the oscillating 
character of both Bessel functions, these conditions would be typically satisfied by an infinite set of 
discrete pairs {y, c}.) Note, however, that generally none of these pairs would correspond to zeros of 
either J, or Y,, so that having an analog of Table 1 for the latter function would not help much. Hence, 
even the simplest problems of this kind (like the one shown in Fig. 20) typically require the numerical 
solution of transcendental algebraic equations. 


5! A pair of independent linear functions, used for the representation of the general solution of the Bessel 
equation, may be also chosen differently, using the so-called Hankel functions 


Hy” (6) = JF, (6) +i¥, ). 


For representing the general solution of Eq. (130), this alternative is completely similar, for example, to using the 
pair of complex functions exp{tiax} = cos ax + isin ax instead of the pair of real functions {cos ax, sin ax} for 
the representation of the general solution of Eq. (89) for X(x). 
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In order to complete the discussion of variable separation in the cylindrical coordinates, one 
more issue to address is the so-called modified Bessel functions: of the first kind, I (é), and of the second 
kind, K (é). They are two linearly-independent solutions of the modified Bessel equation, 


dk 1d& v \ 
de? +B {iet eno, (2.155) 


which differs from Eq. (130) “only” by the sign of one of its terms. Figure 21 shows a simple problem 
that leads (among many others) to this equation: a round thin conducting cylindrical pipe is sliced, 
perpendicular to its axis, to rings of equal height h, which are kept at equal but sign-alternating 
potentials. 


Zz 
i: | g@=4V/2 
g=V/2 

t Fig. 2.21. A typical boundary problem whose 

t @=+V/2 solution may be conveniently described in 


terms of the modified Bessel functions. 


If the system is very long (formally, infinite) in the z-direction, we may use the variable 
separation method for the solution of this problem, but now evidently need periodic (rather than 
exponential) solutions along the z-axis, i.e. linear combinations of sin kz and cos kz with various real 
values of the constant k. Separating the variables, we arrive at a differential equation similar to Eq. 
(129), but with the negative sign before the separation constant: 


d’R 1d 
dp’ pdp 


2 
(+R = 0. (2.156) 


The same radial coordinate’s normalization, € = kp, immediately leads us to Eq. (155), and hence (for v 
=n) to the modified Bessel functions /,(¢) and K,(é). 


Figure 22 shows the behavior of such functions, of a few lowest orders. One can see that at € > 
0 the behavior is virtually similar to that of the “usual” Bessel functions — cf. Eqs. (132) and (152), with 
K,(é) multiplied (by purely historical reasons) by an additional coefficient, 7/2: 


- (= + | forn =0, 


- ! eas | 
ae) [;| , forn#0, 
2 2 


1,6) “(2), K,() > (2.157) 


2 


However, the asymptotic behavior of the modified functions is very much different, with [,(x) 
exponentially growing, and K,(é) exponentially dropping at € > oo: 
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1/2 
1,() >| ) e>. (2.158) 
3 
; / 
1 
Fig. 2.22. The modified Bessel 
functions of the first kind (left 
panel) and the second kind 
i : : (right panel). 


g 


This behavior is completely natural in the context of the problem shown in Fig. 21, in which the 
electrostatic potential may be represented as a sum of terms proportional to /,(yo) inside the thin pipe, 
and of terms proportional to K,( yp) outside it. 


To complete our brief survey of the Bessel functions, let me note that all of them discussed so far 
may be considered as particular cases of Bessel functions of the complex argument, say J,(z) and Y,(z), 
or, alternatively, H,“'?(z) =Jn(z) + iY,(2).2 At that, the “usual” Bessel functions J,(&) and Y,(é) may be 
considered as the sets of values of these generalized functions on the real axis (z = &), while the 
modified functions as their particular case on the imaginary axis, i.e. at z =7é, also with real ¢: 


L(€)=i J, G8),  K,@)= Sf a): (2.159) 


Moreover, this generalization of the Bessel functions to the whole complex plane z enables the use of 
their values along other directions on that plane, for example under angles 7/4 + 7/2. As a result, one 
arrives at the so-called Kelvin functions: 


ber,é +ibei,g =J, (Ge tri, 
(2.160) 
ker, E+i kei ,g = oe Ge, 


which are also useful for some important problems in physics and engineering. Unfortunately, I do not 
have time/space to discuss these problems in this course.*? 


52 These complex functions still obey the general relations (143) and (146), with € replaced with z. 

53 In the QM part of in this series we will run into the so-called spherical Bessel functions j,(é) and y,(&), which 
may be expressed via the Bessel functions of semi-integer orders. Surprisingly enough, these functions turn out to 
be simpler than J,,(é) and Y,,(é). 
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2.8. Variable separation — spherical coordinates 


The spherical coordinates are very important in physics, because of the (at least approximate) 
spherical symmetry of many physical objects — from nuclei and atoms, to water drops in clouds, to 
planets and stars. Let us again require each component ¢ of Eq. (84) to satisfy the Laplace equation. 
Using the full expression for the Laplace operator in spherical coordinates,*4 we get 


2 
or r> sin wal’ 00 oe sin’ 0 0p 


Let us look for a solution of this equation in the following variable-separated form: 


a 


¢, =—— P(cos0)4(9), (2.162) 


Separating the variables one by one, starting from @, just like this has been done in cylindrical 
coordinates, we get the following equations for the partial functions participating in this solution: 


Pe 6. (2.163) 
dr r 
d ; 7 
ra ey). fesn- po. (2.164) 
Et pei (2.165) 
dp 


where € = cos@ is a new variable used in lieu of 0 (so that -1 < €< +1), while V and ((/+1) are the 
separation constants. (The reason for selection of the latter one in this form will be clear in a minute.) 


One can see that Eq. (165) is very simple, and is absolutely similar to the Eq. (107) we have got 
for the polar and cylindrical coordinates. Moreover, the equation for the radial functions is simpler than 
in the cylindrical coordinates. Indeed, let us look for its partial solution in the form cr“ — just as we have 
done with Eq. (106). Plugging this solution into Eq. (163), we immediately get the following condition 
on the parameter a: 


a(a—-1)=1(1+1). (2.166) 
This quadratic equation has two roots, a=/+ 1 and a=—1/, so that the general solution of Eq. (163) is 
b 
ia a +=, (2.167) 
r 


However, the general solution of Eq. (164) (called either the general or associated Legendre 
equation) cannot be expressed via what is usually called elementary functions.5> Let us start its 
discussion from the axially-symmetric case when 0¢/0g =0. This means (¢) = const, and thus v = 0, so 
that Eq. (164) is reduced to the so-called Legendre differential equation: 


54 See, e.g., MA Eq. (10.9). 
55 Again, there is no generally accepted line between the “elementary” and “special” functions. 
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d 


>», dP 7 
“a E [+007 =0. (2.168) 


One can readily verify that the solutions of this equation for integer values of / are specific (Legendre) 
polynomials** that may be described by the following Rodrigues’ formula: 


ae 
P(E) = sry Gar —1)', with 7=0,1, 2,.... (2.169) 
According to this formula, the first few Legendre polynomials are pretty simple: 
Py (S) =1, 
A(S)=s, 
L i238 
PS) = 58é = 1), (2.170) 


Pe)= >be" -34) 


P(E) = = 656" =302? £3)... 


though such explicit expressions become more and more bulky as / is increased. As Fig. 23 shows, all 
these polynomials, which are defined on the [-1, +1] segment, end at the same point: A(+1) = + 1, while 


starting either at the same point or at the opposite point: A(-1) = (-1)!. Between these two endpoints, the 


/" Legendre polynomial has / zeros. It is straightforward to use Eq. (169) for proving that these 
polynomials form a full, orthogonal set of functions, with the following normalization rule: 


+1 
2 
P(E)P,.(€)dEé =, 2.171 
[ROPE = Fi (2.171) 
so that any function /(¢) defined on the segment [-1, +1] may be represented as a unique series over the 
polynomials.*7 


Thus, taking into account the additional division by r in Eq. (162), the general solution of any 
axially-symmetric Laplace problem may be represented as 


o(r,0) = > [air + a} (cos6). (2.172) 
1=0 r 


Note a strong similarity between this solution and Eq. (112) for the 2D Laplace problem in the polar 
coordinates. However, besides the difference in the angular functions, there is also a difference (by one) 
in the power of the second radial function, and this difference immediately shows up in problem 
solutions. 


56 Just for the reference: if / is not an integer, the general solution of Eq. (2.168) may be represented as a linear 
combination of the so-called Legendre functions (not polynomials!) of the first and second kind, P(é) and @(é). 

57 This is why, at least for the purposes of this course, there is no good reason for pursuing (more complicated) 
solutions to Eq. (168) for non-integer values of /, mentioned in the previous footnote. 
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Fig. 2.23. A few lowest Legendre 
polynomials A(é). 


Indeed, let us solve a problem similar to that shown in Fig. 15: find the electric field around a 
conducting sphere of radius R, placed into an initially uniform external field Eo (whose direction I will 
now take for the z-axis) — see Fig. 24a. 


(b) 


| 


Fig. 2.24. Conducting sphere in a uniform electric field: (a) the problem’s geometry, and (b) the 
equipotential surface pattern given by Eq. (176). The pattern is qualitatively similar but 
quantitatively different from that for the conducting cylinder in a perpendicular field — cf. Fig. 15. 


If we select the arbitrary constant in the electrostatic potential so that ¢|,-p = 0, then in Eq. (172) 
we should take ap = bp = 0. Now, just as has been argued for the cylindrical case, at r >> R the potential 
should approach that of the uniform field: 


¢@— -E,z =—E,rcos@, (2.173) 


so that in Eq. (172), only one of the coefficients a; survives: a; = —Eo6;,;. As a result, from the boundary 
condition on the surface, A&R, 0) = 0, we get the following equation for the coefficients b;: 


[- ByR+7: Jeose+ Sh P,(cos 0) =0. (2.174) 


122 
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Now repeating the argumentation that led to Eq. (117), we may conclude that Eq. (174) is satisfied if 


BS ER Oya (2.175) 
so that, finally, Eq. (172) is reduced to 


3 
p= rE leoso (2.176) 
r 


This distribution, shown in Fig. 24b, is very much similar to Eq. (117) for the cylindrical case (cf. Fig. 
15b, with the account for a different plot orientation), but with a different power of the radius in the 
second term. This difference leads to a quantitatively different distribution of the surface electric field: 


E, ec cr = IE, cos@, (2.177) 
rT 


so that its maximal value is a factor of 3 (rather than 2) larger than the external field. 


Now let me briefly (mostly just for the reader’s reference) mention the Laplace equation 
solutions in the general case — with no axial symmetry. If the conductor-free space surrounds the origin 
from all sides, the solutions to Eq. (165) have to be 2z-periodic, and hence v= n = 0, +1, +2,... 
Mathematics says that Eq. (164) with integer v =n and a fixed integer / has a solution only for a 
limited range of 1:98 

—l<n<4l. (2.178) 


These solutions are called associated Legendre functions (generally, they are not polynomials). For n = 

0, these functions may be defined via the Legendre polynomials, using the following formula:5? 

n/2 d" 
ae 

On the segment € € [-1, +1], each set of the associated Legendre functions with a fixed index n and non- 

negative values of / form a full, orthogonal set, with the normalization relation, 


P'(é)=(-I)"-€’) Rey: (2.179) 


2 (l+n)! 
M+iLdsnr 


[7° ©?" @dé = (2.180) 


that is evidently a generalization of Eq. (171). 


Since these relations may seem a bit intimidating, let me write down explicit expressions for a 
few A” (cos@) with the three lowest values of / and n > 0, which are most important for applications. 


1=0: P?(cos6)=1; (2.181) 


58 In quantum mechanics, the letter n is typically reserved for the “principal quantum number”, while the 
azimuthal functions are numbered by index m. However, here I will keep using n as their index because for this 
course’s purposes, this seems more logical, in view of the similarity of the spherical and cylindrical functions. 

59 Note that some texts use different choices for the front factor (called the Condon-Shortley phase) in the 
functions ®”, which do not affect the final results for the spherical harmonics Y;”. 
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P° = 
l=1: | 1 (¢0s8)= cos, (2.182) 


P! (cos @)=—sin 6; 
P} (cos 6) = (3 cos’ 0 — 1)/ 2, 
1=2:4P} (cos) =—2sin 0 cos 8, (2.183) 
P; (cos @) = —3cos’ @. 


The reader should agree there is not much to fear in these functions — they are just certain sums of 
products of cos@= € and sinO=(1— &)'”. Fig. 25 shows the plots of a few lowest functions A” (2). 


Fe) 


Fig. 2.25. A few lowest 
associated Legendre functions. 
(Adapted from an original by 
Geek3, available at 
https://en.wikipedia.org/wiki/Ass 
ociated Legendre polynomials, 
under the GNU Free 
Documentation License.) 


Using the associated Legendre functions, the general solution (162) to the Laplace equation in 


the spherical coordinates may be expressed as Variable 
separation 
0 F b, 1 in spherical 
= pr = i ; : coordinates 
WT, 0, ?) 2, far + rl 2 i (cos OF, (9), Ts (—) C, cos n@ = Si, sinn QP (2 184) (general 


case) 
Since the difference between the angles @and @ is somewhat artificial, physicists prefer to think not in 
terms of the functions P and 7 in separation, but directly about their products that participate in this 
solution. 


60 In quantum mechanics, it is more convenient to use a slightly different alternative set of basic functions of the 
same problem, namely the following complex functions called the spherical harmonics: 


en .« _[28410=n)! 
M (0.0) =| 4x (l+n)! 


which are defined for both positive and negative n (within the limits —/ <n < +/) — see, e.g., QM Secs. 3.6 and 5.6. 
(Note again that in that field, our index n is traditionally denoted as m, and called the magnetic quantum number.) 


1/2 ; 
| P," (cos O)e""® , 
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As a rare exception for my courses, to save time I will skip giving an example of using the 
associated Legendre functions in electrostatics, because quite a few examples of these functions’ 
applications will be given in the quantum mechanics part of this series. 


2.9. Charge images 


So far, we have discussed various methods of solution of the Laplace boundary problem (35). 
Let us now move on to the discussion of its generalization, the Poisson equation (1.41). We need it 
when besides conductors, we also have stand-alone charges with a known spatial distribution p(r). (Its 
discussion will also allow us, better equipped, to revisit the Laplace problem in the next section.) 


Let us start from a somewhat limited, but very useful charge image (or “image charge”) method. 
Consider a very simple problem: a single point charge near a conducting half-space — see Fig. 26. 


Fig. 2.26. The simplest problem readily solvable by the 
charge image method. The points’ colors are used, as 
before, to denote the charges of the original (red) and 
opposite (blue) sign. 


Let us prove that its solution, above the conductor’s surface (z = 0), may be represented as: 


gr)=—_|#-4]2_7 |_1 |) (2.185) 
4ne,\r, m,) 4ae,\|r—-r') [r-r” 


or in a more explicit form, using the cylindrical coordinates shown in Fig. 26: 


= : : 2.186 
p(r) Ane, Ip 6-0" E tart ( ) 


where p is the distance of the field observation point r from the “vertical” line on which the charge is 
located. Indeed, this solution satisfies both the boundary condition ¢= 0 at the surface of the conductor 
(z = 0), and the Poisson equation (1.41), with the single o-functional source at point r’ = {0, 0, +d} on 
its right-hand side, because the second singularity of the solution, at point r” = {0, 0, —d}, is outside the 
region of the solution’s validity (z = 0). Physically, this solution may be interpreted as the sum of the 
fields of the actual charge (+g) at point r’, and an equal but opposite charge (—q) at the “mirror image” 
point r” (Fig. 26). This is the basic idea of the charge image method. 


Before moving on to more complex problems, let us discuss the situation shown in Fig. 26 in a 
little bit more detail, due to its fundamental importance. First, we can use Eqs. (3) and (186) to calculate 
the surface charge density: 
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1 1 q 2d 
oe ag = eS) ——_ - er | = See 218 
: oe lo?+(z+ay] 4 47 (p? +a?) 


From this, the total surface charge is 


De ety AOS ol a ae yp pdp. (2.188) 
S 0 mM 
This integral may be easily worked out using the substitution = o 7 (giving dé=2pdpla’): 


q 
Se ~qg. 2.189 
5) q ( ) 


lea (E+1) ye = 


This result is very natural: the conductor brings as much surface charge from its interior to the surface as 
necessary to fully compensate for the initial charge (+q) and hence kill the electric field at large 
distances as efficiently as possible, hence reducing the total electrostatic energy (1.65) to the lowest 
possible value. 


For a better feeling of this polarization charge of the surface, let us take our calculations to the 
extreme — to the g equal to one elementary change e, and place a particle with this charge (for example, 
a proton) at a macroscopic distance — say | m — from the conductor’s surface. Then, according to Eq. 
(189), the total polarization charge of the surface equals that of an electron, and according to Eq. (187), 
its spatial extent is of the order of d’ = 1 m’. This means that if we consider a much smaller part of the 
surface, AA << a’, its polarization charge magnitude AQ = oAA is much less than one electron! For 
example, Eq. (187) shows that the polarization charge of quite a macroscopic area AA = 1 cm” right 
under the initial charge (p = 0) is eAA/27d° ~ 1.6x10° e. Can this be true, or our theory is somehow 
limited to the charges gq much larger than e? (After all, the theory is substantially based on the 
approximate macroscopic model (1); maybe it is the culprit?) 


Surprisingly enough, the answer to this question has become clear (at least to some physicists :-) 
only as late as in the mid-1980s when several experiments demonstrated, and theorists accepted (some 
of them rather grudgingly) that the usual polarization charge formulas are valid for elementary charges 
as well, i.e., such the polarization charge AQ of a macroscopic surface area may differ from a multiple 
of e. The underlying reason for this paradox is the physical nature of the polarization charge of a 
conductor’s surface: as was discussed in Sec. 1, it is due not to new charged particles brought into the 
conductor (such charge would be in fact a multiple of e), but to a small shift of the free charges of a 
conductor by a very small distance from their equilibrium positions that they had in the absence of the 
external field induced by charge qg. This shift is not quantized, at least on the scale relevant to our 
problem, and hence neither is AQ. 


This understanding has paved the way toward the invention and experimental demonstration of 
several new devices including so-called single-electron transistors,®! which may be used, in particular, 
for ultrasensitive measurement of polarization charges as small as ~10° e. Another important class of 
single-electron devices is the de and ac current standards based on the fundamental relation J = —ef, 


6! Actually, this term (for which the author of these notes may be blamed :-) is misleading: the operation of the 
“single-electron transistor” is based on the interplay of discrete charges (multiples of e) transferred between 
conductors, and sub-single-electron polarization charges — see, e.g., K. Likharev, Proc. IEEE 87, 606 (1999). 
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where J is the de current carried by electrons transferred with the frequency f, The experimentally 
achieved® relative accuracy of such standards is of the order of 10°’, and is not too far from that 
provided by the competing approach based on a combination of the Josephson effect and the quantum 
Hall effect.® 


Second, let us find the potential energy U of the charge-to-surface interaction. For that, we may 
use the value of the electrostatic potential (185) at the point of the charge itself (r = r’), of course 
ignoring the infinite potential created by the real charge, so that the remaining potential is that of the 
image charge 


1 q 
_  (rjy=— =29 2.190 
Pimage ( ) rE, 5 d ( ) 


Looking at the electrostatic potential’s definition given by Eq. (1.31), it may be tempting to immediately 
write U = qdimage = — (1/4&)(q?/2d) [WRONG!], but this would be incorrect. The reason is that the 
potential ¢image is not independent of qg, but is actually induced by this charge. This is why the correct 
approach is to calculate U from Eq. (1.61), with just one term: 


1 ae 
U =—q@iune = — —, 2.191 

2 IPrmage Ané, Ad ( ) 
giving twice lower energy than the wrong result cited above. To double-check Eq. (191), and also get a 
better feeling of the factor ‘2 that distinguishes it from the wrong guess, we can calculate U as the 
integral of the force exerted on the charge by the conductor’s surface charge (1.e., in our formalism, by 
the image charge): 

d 


d 2 2 
U =-| F(2)\dz =— 4 = a a (2.192) 
- Ane, *,(2Z) Are, 4d 


This calculation clearly accounts for the gradual build-up of the force F, as the real charge is being 
brought from afar (where we have opted for U =0) toward the surface. 


This result has several important applications. For example, let us plot the electrostatic energy U 
of an electron, i.e. a particle with charge gq = —e, near a metallic surface, as a function of d. For that, we 
may use Eq. (191) until our macroscopic model (1) becomes invalid, and U transitions to some negative 
constant value (-y) inside the conductor — see Fig. 27a. Since our calculation was for an electron with 
zero potential energy at infinity, at relatively low temperatures, kgT << y, electrons in metals may 
occupy only the states with energies below —y (the so-called Fermi level). The positive constant y is 
called the workfunction because it describes the smallest work needed to remove the electron from a 
metal. As was discussed in Sec. 1, in good metals the electric field screening takes place at interatomic 
distances ao ~ 107° m. Plugging d =1x10'° m and g = -e ~ -1.6x10°” C into Eq. (191), we get y~ 
6x10? J = 3.5 eV. This crude estimate is in surprisingly good agreement with the experimental values 
of the workfunction, ranging between 4 and 5 eV for most metals.® 


62 See, e.g., M. Keller et al., Appl. Phys. Lett. 69, 1804 (1996) ; F. Stein et al., Metrologia 54, 1 (2017). 
63 J, Brun-Pickard et al., Phys. Rev. X 6, 041051 (2016). 

64 More discussion of these states may be found in SM Secs. 3.3 and 6.3. 

65 More discussion of the workfunction, and its effect on the electrons’ kinetics, is given in SM Sec. 6.3. 
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(b) 


Fig. 2.27. (a) The origin 
of the workfunction, and 
(b) the field emission of 
electrons (schematically). 


Next, let us consider the effect of an additional uniform external electric field Eo applied 
normally to a metallic surface, on this potential profile. For that, we may the potential energy that the 
field gives to the electron at distance d from the surface, Uex: = —eEod, to that created by the image 
charge. (As we know from Eq. (1.53), since the field Eo is independent of the electron’s position, its 
recalculation into the potential energy does not require the coefficient 2.) As the result, the potential 
energy of an electron near the surface becomes 


1 2 
U(d) =-eE,d ae for d >> ao, (2.193) 
4mé, 4d 
with a similar crossover to U = —y inside the conductor — see Fig. 27b. One can see that at the 


appropriate sign, and a sufficient magnitude of the applied field, it lowers the potential barrier that 
prevents electrons from leaving the conductor. At Ey ~ w/ao (for metals, ~10'° V/m), this suppression 
becomes so strong that electrons with energies at, and just below the Fermi level start quantum- 
mechanical tunneling through the remaining thin barrier. This is the field electron emission (or just 
“field emission’’) effect, which is used in vacuum electronics to provide efficient cathodes that do not 
require heating to high temperatures.% 


Returning to the basic electrostatics, let us find some other conductor geometries where the 
method of charge images may be effectively applied. First, let us consider a right-angle corner (Fig. 
28a). Reflecting the initial charge in the vertical plane, we get the image shown in the top left corner of 
that panel. This image makes the boundary condition ¢ = const satisfied on the vertical surface of the 
corner. However, for the same to be true on the horizontal surface, we have to reflect both the initial 
charge and the image charge in the horizontal plane, flipping their signs. The final configuration of four 
charges, shown in Fig. 28a, satisfies all boundary conditions. The resulting potential distribution may be 
readily written as an evident generalization of Eq. (185). From it, the electric field and electric charge 
distributions, and the potential energy and forces acting on the charge may be calculated exactly as 
above — an easy exercise left for the reader. 


Next, consider a corner with the angle 7/4 (Fig. 28b). Here we need to repeat the reflection 
operation not two but four times before we arrive at the final pattern of eight positive and negative 
charges. (Any attempt to continue this process would lead to overlap with the already existing charges.) 


66 The practical use of such “cold” cathodes is affected by the fact that, as it follows from our discussion in Sec. 4, 
any nanoscale irregularity of a conducting surface (a protrusion, an atomic cluster, or even a single “adatom” 
stuck to it) may cause a strong increase of the local field well above the applied uniform field £9, making the 
electron emission reproducibility and stability in time significant challenges. In addition, the impact-ionization 
effects may lead to avalanche-type electric breakdown at de fields as low as ~3x10° V/m. 
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This reasoning may be readily extended to corners of angles @= a/n, with any integer n, which require 
2n charges (including the initial one) to satisfy all the boundary conditions. 


b (c) 
” td 2atd 
a aah 
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(d) (e) 
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Fig. 2.28. The charge images for (a, b) the corners with angles 7/2 and 7/4, (c) a plane capacitor, 
and (d) a rectangular box; (e) typical equipotential surfaces for the last system. 


Some configurations require an infinite number of images but are still tractable. The most 
important of them is a system of two parallel conducting surfaces, i.e. an unbiased plane capacitor of 
infinite area (Fig. 28c). Here the repeated reflection leads to an infinite system of charges +q at points 


x* =2ajtd, (2.194) 


where d (with 0 < d < a) is the position of the initial charge, and / is an arbitrary integer. The resulting 
infinite sum for the potential of the real charge q, created by the field of its images, 


1 q q 1 @e 1 
gyn | a + 
a4) 4né,| 2d 2d arr a? jj? —(d/ay 


, (2.195) 
} 


+ q 
d-x; 


is converging (in its last form) very fast. For example, the exact value, a/2) = —21n2 (q/4zea), differs 
by less than 5% from the approximation using just the first term of the sum. 
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The same method may be applied to 2D (cylindrical) and 3D rectangular conducting boxes that 
require, respectively, 2D or 3D infinite rectangular lattices of images; for example in a 3D box with 
sides a, b, and c, charges +q are located at points (Fig. 28d) 


ri, =2jat+2kb+2ie+r’, (2.196) 


where r’ is the location of the initial (real) charge, andj, k, and / are arbitrary integers. Figure 28e shows 
a typical result of the summation of the potentials of such charge set, including the real one, in a 2D box 
(within the plane of the real charge). One can see that the equipotential surfaces, concentric near the 
charge, are naturally leaning along the conducting walls of the box, which have to be equipotential. 


Even more surprisingly, the image charge method works very efficiently not only for rectilinear 
geometries but also for spherical ones. Indeed, let us consider a point charge q at distance d from the 
center of a conducting, grounded sphere of radius R (Fig. 29a), and try to satisfy the boundary condition 
¢= 0 for the electrostatic potential on the sphere’s surface using an imaginary charge q’ located at some 
point beyond the surface, i.e. inside the sphere. 


(b) 


Fig. 2.29. Method of charge images for 
a conducting sphere: (a) the idea, and 
(b) the resulting potential distribution 
in the central plane containing the 
charge, for the particular case d= 2 R. 


From the problem’s symmetry, it is clear that the point should be at the line passing through the 
real charge and the sphere’s center, at some distance d’ from the center. Then the total potential created 
by the two charges at an arbitrary point of free space, i.e. at 7 > R (Fig. 29a) is 


1 q q' 
g(r, 0) = a ee ee | (2.197) 
ATE , (7 +-q° =Did cos0) (r? fegi'* —2rd'cos0) 


This expression shows that we can make the two involved fractions equal and opposite at all points on 
the sphere’s surface (i.e. for any 0 at r = R) if we take®’ 
R? R 

d'=—., q’=-—q. (2.198) 
d 
Since the solution of any Poisson boundary problem is unique, Eqs. (197) and (198) give us the final 
solution for this problem. Fig. 29b shows a typical equipotential pattern following from this solution. It 
may be surprising how formulas that simple may describe such an elaborate field distribution. 


67 In geometry, such points with dd’ = R’, are referred to as the result of mutual inversion in a sphere of radius R. 
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Now we can calculate the total charge QO on the grounded sphere’s surface, induced by the 
external charge g. We could do this, as we have done for the conducting plane, by the brute-force 
integration of the surface charge density o = —a0¢/drl , = z. It is more elegant, however, to use the 
following Gauss law argument. Equality (197) is valid (at r = R) regardless of whether we are dealing 
with our real problem (charge g and the conducting sphere) or with the equivalent charge configuration 
— with the point charges g and q’, but no sphere at all. Hence, according to Eq. (1.16), the Gaussian 
integral over a surface with radius r = R + 0, and the total charge inside the sphere should be also the 
same. Hence we immediately get 


, R 
o=q'=-%y. (2.199) 
A similar argumentation may be used to calculate the charge-to-sphere interaction force: 
2 2 
: R 1 Rd 
F = GE nue (€) = g——1—_,, = - 4 q (2.200) 


Ane (d—d'y  4ne,d(d—-R?/dy ne, (d°—-R’) 


(Note that this expression is legitimate only at d > R.) At large distances, d >> R, this attractive force 
decreases as 1/d*. This unusual dependence arises because, as Eq. (199) specifies, the induced charge of 
the sphere, responsible for the force, is not constant but decreases as 1/d. In the next chapter, we will see 
that such force is also typical for the interaction between a point charge and a dipole. 


All previous formulas were for a sphere that is grounded to keep its potential equal to zero. But 
what if we keep the sphere galvanically insulated, so that its net charge is fixed, for example, equals 
zero? Instead of solving this problem from scratch, let us use (again!) the almighty linear superposition 
principle. For that, we may add to the previous problem an additional charge, equal to -O = —q’, to the 
sphere, and argue that this addition gives, at all points, an additional, spherically-symmetric potential 
that does not depend on the potential induced by the external charge qg, and was calculated in Sec. 1.2 — 
see Eq. (1.19). For the interaction force, such addition yields 


’ ’ 2 
a N , (2.201) 
A4mée,(d—d')°  4ne,d A4mé,|(d°-R°) d 


At large distances, the two terms proportional to 1/d’ cancel each other, giving F « 1/d°, so that the 
potential energy of such interaction behaves as U « 1/d*. Such a rapid force decay is due to the fact that 
the field of the uncharged sphere is equivalent to that of two (equal and opposite) induced charges +q’ 
and —q’, and the distance between them (d— d’=d-— R’/d) tends to zero at d > , 


2.10. Green’s functions 


I have spent so much time/space discussing potential distributions created by a single point 
charge in various conductor geometries because for any of the geometries, the generalization of these 
results to the arbitrary distribution p(r) of free charges is straightforward. Namely, if a single charge q, 
located at some point r’, creates at point r the electrostatic potential 


o(r) = qGr,r'), (2.202) 
ATE, 


then, due to the linear superposition principle, an arbitrary charge distribution creates the potential 
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1 1 — 
dlr) = rea j|G,r,)= = [ee Ga.r)d*r". (2.203) 


The function G(r, r’) is called the (spatial) Green’s function — the notion very fruitful and hence popular 
in all fields of physics.®8 Evidently, as Eq. (1.35) shows, in the unlimited free space 


G(r,r') = _—__, (2.204) 


i.c. the Green’s function depends only on one scalar argument — the distance between the field- 
observation point r and the field-source (charge) point r’. However, as soon as there are conductors 
around, the situation changes. In this course, I will only discuss the Green’s functions defined to vanish 
as soon as the radius-vector r points to the surface (S) of any conductor:® 


Grr) cs =0. (2.205) 


With this definition, it is straightforward to deduce the Green’s functions for the solutions of the 
last section’s problems in which conductors were grounded, i.e. had potential ¢ = 0. For example, for a 
semi-space z = 0 limited by a grounded conducting plane z = 0 (Fig. 26), Eq. (185) yields 
1 1 


jr-r’ 


G ; withp” =p’ and z”=—z’', (2.206) 


1 


[r-r’ 


where p is the 2D radius vector. We see that in the presence of conductors (and, as we will see later, any 
other polarizable media), the Green’s function may depend not only on the difference r—r’, but on each 
of these two arguments in a specific way. 


So far, this is just re-naming our old results. The really non-trivial result of the Green’s function 
formalism in electrostatics is that, somewhat counter-intuitively, the knowledge of this function for a 
system with grounded conductors (Fig. 30a) enables the calculation of the field created by voltage- 
biased conductors (Fig. 30b), with the same geometry. To show this, let us use the so-called Green’s 
theorem of the vector calculus.”° The theorem states that for any two scalar, differentiable functions f(r) 
and g(r), and any volume J, 


[fv?g-gv*s)a°r=$(¢Ve—2Vf),d°r, (2.207) 


where S is the surface limiting the volume. Applying the theorem to the electrostatic potential dr) and 
the Green’s function G (also considered as a function of r), let us use the Poisson equation (1.41) to 
replace V’¢ with (-p/é), and notice that G, considered as a function of r, obeys the Poisson equation 
with the o-functional source: 


V°G(r,r’) = —475(r 1’). (2.208) 


68 See, e.g., CM Sec. 5.1, QM Secs. 2.2 and 7.4, and SM Sec. 5.5. 

69 G so defined is sometimes called the Dirichlet function. 

70 See, e.g., MA Eq. (12.3). Actually, this theorem is a ready corollary of the better-known divergence (“Gauss”) 
theorem, MA Eq. (12.2). 
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(Indeed, according to its definition (202), this function may be formally considered as the field of a 
point charge g = 477@.) Now swapping the notation of the radius-vectors, r <> r’, and using the Green’s 
function symmetry, G(r, r’) = G(r’, r),”! we get 


-andie)-[[-2 Janene _ fot yer) .r’) = Grr lary (2.209) 


0 


(a) (b) 
? = %, 
a . ew ?, 
q; 
q, © ) q, © of) 


Fig. 2.30. Green’s function method allows the solution of a simpler boundary problem (a) to be 
used for the solution of a more complex problem (b), for the same conductor geometry. 


Let us apply this relation to the volume V of free space between the conductors, and the 
boundary S drawn immediately outside of their surfaces. In this case, by its definition, Green’s function 
G(r, r’) vanishes at the conductor surface, i.e. at r € S— see Eq. (205). Now changing the sign of dn’ (so 
that it would be the outer normal for conductors, rather than free space volume JV), dividing all terms by 
4x, and partitioning the total surface S into the parts (numbered by index /) corresponding to different 
conductors (possibly, kept at different potentials ¢,), we finally arrive at the famous result:72 


ee ’ ' 3 m4 a v) Quy 
p(r) = res J p(r')G(r,r')dr ot Pa a gy (2.210) 


While the first term on the right-hand side of this relation is a direct and evident expression of 
the superposition principle, given by Eq. (203), the second term is highly non-trivial: it describes the 
effect of conductors with arbitrary potentials ¢ (Fig. 30b), using the Green’s function calculated for the 
similar system with grounded conductors, i.e. with all ¢ = 0 (Fig. 30a). Let me emphasize that since our 
volume V excludes conductors, the first term on the right-hand side of Eq. (210) includes only the stand- 
alone charges in the system (in Fig. 30, marked q1,q2, etc.), but not the surface charges of the conductors 
— which are taken into account, indirectly, by the second term. 


In order to illustrate what a powerful tool Eq. (210) is, let us use to calculate the electrostatic 
field in two systems. In the first of them, a plane, circular, conducting disk of radius R, separated with a 
very thin cut from the remaining conducting plane, is biased with potential ¢ = V, while the rest of the 
plane is grounded — see Fig. 31. 


71 This symmetry, evident for the particular cases (204) and (206), may be readily proved for the general case by 
applying Eq. (207) to the functions f(r) = G(r, r’) and g(r) = G(r, r”’). With this substitution, the left-hand side of 
that equality becomes equal to -4z[G(r”, r’)— G(r’, r’’)], while the right-hand side is zero, due to Eq. (205). 

72 In some textbooks, the sign before the surface integral is negative, because their authors use the outer normal to 

the free-space region V rather than those occupied by conductors — as I do. 
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$=0 


Fig. 2.31. A voltage-biased conducting disk separated from the rest of a conducting plane. 


If the width of the gap between the disk and the rest of the plane is negligible, we may apply Eq. 
(210) without stand-alone charges, p(r’) = 0, and the Green’s function for the uncut plane — see Eq. 
(206).73 In the cylindrical coordinates, with the origin at the disk’s center (Fig. 31), the function is 


i 
2 92 ’ ’ n2 {1/2 
lp +p —2pp' cos(p— 9") + (z-2')°| 
1 
2 92 ' ’ n\ 2 1/2 * 
lp +p’ -2pp cos(p—') +(z+2")’| 


(The sum of the first three terms under each square root in Eq. (211) is just the squared distance between 
the horizontal projections p and p’ of the vectors r and r’ (or r”) correspondingly, while the last terms 
are the squares of their vertical displacements.) 


G(r,r’) = 


(2.211) 


Now we can readily calculate the derivative participating in Eq. (210), for z = 0: 


0G, _oG 
én'!S — @z" 


22 


(2.212) 


z'=+0 — 3/2 
(0? +p” — 2p! cos(y- 9") +z’) 


Due to the axial symmetry of the system, we may take g for zero. With this, Eqs. (210) and (212) yield 


p 


7 + oG(r,r') 


Vet of dp’ 
= dr’ =— [dg |," (2.213) 
4a On' 2 At +p! —2 


pp! cos! + 77 pe 
This integral is not overly pleasing, but may be readily worked out at least for points on the symmetry 


axis (0 = 0, z= 0): 
pip’ Vor ae z 

= =V|1- , 2.214 
rio ee ye 2 J (ei)? R? 422)? ( ) 


This result shows that if z + 0, the potential tends to V (as it should), while at z >> R, 


R? 


>V—,. 
? 22” 


(2.215) 


73 Indeed, if all parts of the cut plane are grounded, a narrow cut does not change the field distribution, and hence 
the Green’s function, significantly. 
74 There is no need to repeat the calculation for z < 0: from the symmetry of the problem, &(—z) = ((z). 
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Now let us use the same Eq. (210) to solve the (in :-)famous problem of the cut sphere (Fig. 32). 
Again, if the gap between the two conducting semi-spheres is very thin (¢ << R), we may use Green’s 
function for the grounded (and uncut) sphere. For a particular case r’ = dn-, this function follows from 
Eqs. (197)-(198); generalizing the former relation for an arbitrary direction of vector r’, we get 


Ge 1/2 aay 1/2? forr,r'>R, (2.216) 
[-? +r" —2rr' cos | [-? +(R*/r')? —2r(R? /r')cos y] 


where vy is the angle between the vectors r and r’, and hence r”’ — see Fig. 32. 


Fig. 2.32. A system of two separated, oppositely 
biased semi-spheres. 


Now, calculating the Green’s function’s derivative, 


0G 
or’ 


(CSR) 
Ri? +R? ~2Rrcosy]*” 


(2.217) 


r'=R+0 > 


and plugging it into Eq. (210), we see that the integration is again easy only for the field on the 
symmetry axis (where r = zn., and v = 6’), giving: 


2 p2 
geen OR 2h too. (2.218) 
2 z(2? +R?) 
For z > R, this relation yields ¢— V/2 (as it should), while for z/R > ~, 
2 
je (2.219) 
4z 


As will be discussed in the next chapter, such a field is typical for an electric dipole. 


2.11. Numerical methods 


Despite the richness of analytical methods, for many boundary problems (especially in 
geometries without a high degree of symmetry), the numerical approach is the only way to the solution. 
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Though software packages offering their automatic numerical solution are abundant nowadays,’> it is 
important for every educated physicist to understand “what is under the hood”, at least because most 
universal programs exhibit mediocre performance in comparison with custom codes written for 
particular problems, and sometimes do not converge at all, especially for fast-changing (say, 
exponential) functions. The very brief discussion presented here’”® is a (hopefully, useful) fast glance 
under the hood, though it is certainly insufficient for professional numerical research work. 


The simplest of the numerical approaches to the solution of partial differential equations, such as 
the Poisson or the Laplace equations (1.41)-(1.42), is the finite-difference method,” in which the sought 
continuous scalar function f(r), such as the potential Ar), is represented by its values in discrete points 
of a rectangular grid (frequently called mesh) of the corresponding dimensionality — see Fig. 33. 


(b) 


Fig. 2.33. The general idea of the finite-difference method in (a) one, (b) two, and (c) three dimensions. 


Each partial second derivative of the function is approximated by the formula that readily 
follows from linear approximations of the function f and then its partial derivatives — see Fig. 33a: 


BF 1.90 Cats of panos ff) fat he~2f oe 


ar? or, | ar, | hl ar, i742 ar, “h 7 Po” 


h h h 


where f., = f(r; + h) and fU = f(r; — A). (The relative error of this approximation is of the order of 
ho flar;.) As a result, the action of a 2D Laplace operator on the function fmay be approximated as 


O70 piste es Set T ed Tac esl ua alee 


2221 
OK. Oy" h? h? h? ( ) 
and of the 3D operator, as 
2 2 2 —6 
Of OS OL gt Ve eI Jet eos. (2.222) 


ax? = ay*—s Gz” h? 
(The notation used in Eqs. (221)-(222) should be clear from Figs. 33b and 33c, respectively.) 


75 See, for example, MA Secs. 16 (iii) and (iv). 

76 It is almost similar to that given in CM Sec. 8.5 and is reproduced here for the reader’s convenience, illustrated 
with examples from this (EM) course. 

77 For more details see, e.g., R. Leveque, Finite Difference Methods for Ordinary and Partial Differential 
Equations, SIAM, 2007. 
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As a simple example, let us use this scheme to find the electrostatic potential distribution inside 

a cylindrical box with conducting walls and square cross-section axa, using an extremely coarse mesh 

with step h=a/2 (Fig. 34). In this case, our function, the electrostatic potential ¢(x, y), equals zero at 

the side and bottom walls, and Vo at the top lid, so that, according to Eq. (221), the 2D Laplace equation 
may be approximated as 

0+0+V,+0—4¢ _0 

(a/2) 


(2.223) 


The resulting value for the potential in the center of the box is @= Vo/4. 


a Fig. 2.34. Numerically solving an internal 2D boundary 
2 problem for a conducting, cylindrical box with a square 
0 cross-section, using a very coarse mesh (with h = a/2). 


Surprisingly, this is the exact value! This may be proved either by solving this problem by the 
variable separation method, just as this has been done for a similar 3D problem in Sec. 5, or just from 
the following Green’s-function argument. If all four walls of our 2D volume were biased to the voltage 
Vo, there would be no electric field in it at all, so that the middle-point potential would be equal Vo as 
well. However, from the point of view of Eq. (210) with no bulk charge, p(r) = 0, this result may be 
legitimately viewed as the linear superposition of the four contributions of the potentials ¢ = Vo of each 
wall. Since for this symmetric geometry, the corresponding geometrical factors are equal, the 
contribution of one wall, with ¢ = 0 on all other walls (as in our current problem), has to equal Vo/4. 


For a similar 3D problem (a cubic box), with a similar 3D mesh, Eq. (222) yields 


0+0+V,+0+0+0—-6¢ © 
(a/2) 


so that ¢ = Vo/6. Using the same Green’s-function argument, now for six wall of the cube, we see that 
this result is also exact! (This fact also follows from our variable-separation result expressed by Eqs. 
(95) and (99) with a= b=c.) 


Though such exact results should be considered as a happy coincidence rather than the general 
law, they still show that numerical methods, even with relatively crude meshes, may be more 
computationally efficient than some “analytical” approaches, like the variable separation method with 
its infinite-sum results that, in most cases, require computers anyway — at least for the result’s 
comprehension and analysis. 


0, (2.226) 


A more powerful (but also much more complex) approach is the finite-element method in which 
the discrete point mesh, typically with triangular cells, is (automatically) generated in accordance with 
the system geometry.’ Such mesh generators provide higher point concentration near sharp convex 


78 See, e.g., CM Fig. 8.14. 


Chapter 2 Page 59 of 68 


Essential Graduate Physics EM: Classical Electrodynamics 


parts of conductor surfaces, where the field concentrates and hence the potential changes faster, thus 
ensuring a better accuracy-to-speed tradeoff than the finite-difference methods on a uniform grid. The 
price to pay for this improvement is the algorithm’s complexity which makes its adjustments much 
harder. Unfortunately, in this series, I do not have time for going into the details of that method and have 
to refer the reader to the special literature on this subject.” 


2.12. Exercise problems 


2.1. Calculate the force (per unit area) exerted on a conducting surface by an external electric 
field normal to it. Compare the result with the electric field’s definition given by Eq. (1.6), and 
comment. 


ay 


2.2. Electric charges QO4 and Qz have been placed on two conducting 
concentric spherical shells — see the figure on the right. What is the full @& \ 
charge of each of the surfaces S\-S4? 


C, C, 
2.3. Calculate the mutual capacitance between the terminals of the | | 
lumped-capacitor circuit shown in the figure on the right. Analyze and C 
interpret the result for major particular cases. | | ; 
C; C, 


Ne 


AnnN 


e 


C, CG Cc 
2.4. Calculate the mutual capacitance between the terminals 2 ic rio i ae 
of the semi-infinite lumped-capacitor circuit shown in the figure on 
the right, and the law of the applied voltage’s decay along the C, Cc C, 
system. Analyze and interpret the result. on 
1 A, 1 
2.5. A system of two thin conducting plates is located over a A 


ground plane as shown in the figure on the right, where 4; and Az are Pre 


the areas of the indicated plate parts, while d’ and d” are the distances. | —]{_~————+— 
between them. Neglecting the fringe effects, calculate: 3 ae 


(1) the effective capacitance of each plate, and ba . 
(ii) their mutual capacitance. 


79 See, e.g., either C. Johnson, Numerical Solution of Partial Differential Equations by the Finite Element 
Method, Dover, 2009, or T. Hughes, The Finite Element Method, Dover, 2000. 
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2.6. A wide and thin film, carrying a uniform electric charge 
density o, is placed inside a plane capacitor whose plates are connected 
with a wire (see the figure on the right), and were initially electroneutral. 
Neglecting the fringe effects, calculate the surface charges of the plates, 
and the net force exerted on the film (per unit area). 


2.7. A relatively small conductor (possibly, of an 
irregular shape) with self-capacitance C is located at distance r 
from the center of a conducting sphere of radius R — see the 
figure on the right. In the first approximation in C, find the 
reciprocal capacitance matrix of the system. Use the matrix to 
calculate its potential energy and the force of the conductor 
interaction for two cases: 


(1) the conductor charges O are equal, and 
(11) the conductors are connected with a thin wire, so that their potentials ¢ are equal. 


2.8. Use the Gauss law to calculate the mutual capacitance of the 
following two-electrode systems, with the cross-section shown in Fig. 7 
(reproduced on the right): 


(i) a conducting sphere inside a concentric spherical cavity inside 
another conductor, and 

(ii) a long conducting cylinder inside a coaxial cavity inside another 
conductor, i.e. a coaxial cable. (In this case, we speak about the capacitance 
per unit length). 


Compare the results with those obtained in Sec. 2.2 using the Laplace 
equation. 


2.9. Calculate the electrostatic potential distribution around two barely V 
separated conductors in the form of coaxial, round cones (see the figure on the p= FZ 
right), with voltage V between them. Compare the result with that of a similar 2D 

problem, with the cones replaced by plane-face wedges. Can you calculate the V 


mutual capacitances between the conductors in these systems? If not, can you %= 
estimate them? 


2.10. Calculate the mutual capacitance between two e 
0 i 


rectangular, plane electrodes of area A = ax/, with a small angle ,~~---"* te 


Q << a/po between them — see the figure on the right. 0 a A 
2.11. Using the results for a single thin round disk, obtained in Sec. 4, i R 


consider a system of two such disks at a small distance d << R from each other d aes 
— see the figure on the right. In particular, calculate: t 


(i) the reciprocal capacitance matrix of the system, 
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(11) the mutual capacitance between the disks, 
(iii) the partial capacitance of one disk, and 
(iv) the effective capacitance of one disk, 


(all in the first nonvanishing approximation in d/R << 1). Compare the results (11)-(iv) and interpret thei 
similarities and differences. 


2.12." Calculate the mutual capacitance (per unit length) between 
two cylindrical conductors forming a system with the cross-section shown 
in the figure on the right, in the limit t << w << R. 


Hint: You may like to use the elliptic coordinates mentioned in Sec. 
4. They are defined by the following equality: 


x+iv=ccosh(u+iv), (*) 


where c is a constant. 


2.13. Calculate the mutual capacitance (per unit length) between two similar, long, paralle 
wires, each with a round cross-section of radius R, whose axes are separated by distance d > 2R. Explore 
and interpret the result in the limits R > 0 and R > 2d. 

Hint: You may like to use the 2D orthogonal bipolar coordinates {t, o} defined by the following} 
relations with the Cartesian coordinates {x, y}: 

sinh r sino 


x = a————__, y =a———__, with -o<tTr<t+0, -mt<o<+7z. 
cosht —coso cosh t —coso 


In these coordinates, the Laplace operator is 
1 2 2 
ve ~ Frleosh cosa) + e } 


a Or? bc’ 


2.14. Formulate 2D electrostatic problems that can be solved using each of the following analytic 


functions of the complex variable z = x + iy: 


(iii) w = z+ 1/z, 


and solve these problems. 


2.15. On each side of a cylindrical volume with a rectangular cross-section axb, with no electric 
charges inside it, the electric field’s component normal to the side’s plane is constant, and opposite to 
that on the opposite side. Calculate the distribution of the electric potential inside the volume, provided 
that the magnitude of the normal components on the sides of length b equals E. Suggest a practicabl 
method to implement such potential distribution. 


2.16. Complete the solution of the problem shown in Fig. 12, by calculating the distribution of 


the surface charge of the semi-planes. Can you calculate the mutual capacitance between the semi- 
planes (per unit length of the system)? If not, can you estimate it? 


Chapter 2 Page 62 of 68 


Essential Graduate Physics EM: Classical Electrodynamics 


2.17. A straight, long, thin, round-cylindrical metallic pipe has been 4+V/2 
cut, along its axis, into two equal parts — see the figure on the right. 


(1) Use the conformal mapping method to calculate the distributions of 
the electrostatic potential created by voltage V applied between the two parts, 
both outside and inside the pipe, and of the surface charge. 

(ii) Calculate the mutual capacitance between pipe’s halves (per unit 
length), taking into account a small width 2¢ << R of the cut. 


Hints: In Task (i), you may like to use the complex function 
R+z -V/2 


> 


w=In 
R-z 


while in Task (ii), you may use the solution of the previous problem. 


o=0 
2.18. A gap of constant width w between two grounded conducting semi-spaces 
is closed, from one side, with a conducting plunger biased with voltage V, so that the w 
cross-section of the system looks like the figure on the right shows. Use the variable |“ e 
separation method to calculate the distribution of the electrostatic potential within the : 
gap. | G=V 
+V /2 
2.19. Use the variable separation method to calculate the electrostatic a a 
potential’s distribution inside a very long thin-wall metallic box with a quadratic 
cross-section, cut and voltage-biased as shown in the figure on the right. (The cut’s - 4 
width is negligibly small.) 
—-Vi2 


2.20. Solve Problem 17(1) using the variable separation method, and compare the results. 


2.21. Use the variable separation method to calculate the potential distribution above the plane 
surface of a conductor, with a strip of width w separated by very thin cuts, and biased with voltage V — 
see the figure below. 


2.22. The previous problem is now modified: the cut-out and voltage-biased part of the 
conducting plane is now not a strip, but a square with side w. Calculate the potential distribution above 
the conductor’s surface. 
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2.23. Each electrode of a large plane capacitor is cut into 
long strips of equal width w, with very narrow gaps between them. — 
These strips are kept at alternating potentials, as shown in the iii | a 
figure on the right. Use the variable separation method to calculate <> 
the electrostatic potential distribution in space, and explore the a 


limit w << d. ra Sige ys 


VV 
2 2 


2.24. Complete the cylinder problem started in Sec. 7 (see Fig. 17), for the cases when the top 
lid’s voltage is fixed as follows: 


(1) V= Vo.Ji(E11/R) sing, where 1; = 3.832 is the first root of the Bessel function J;(é); 

(ii) V = Vo= const. 

For both cases, calculate the electric field at the centers of the lower and upper lids. (For Task 
(ii), an answer including series and/or integrals is acceptable.) 


2.25. For an infinitely long system sketched in Fig. 21: 


(i) calculate and sketch the distribution of the electrostatic potential inside the system for various 
values of the ratio R/h, and 
(11) simplify the results for the limit R/h > 0. 


2.26. A long round cylindrical conducting pipe is split, with a 
very narrow cut normal to its axis, into two parts that are voltage- 
biased as the figure on the right shows. Use two different approaches 
to calculate the force exerted by the resulting field upon a charged 
particle flying along the pipe close to its axis. Can the system work as 
an electrostatic lens? 


2.27. Use the variable separation method to find the potential distribution inside and outside of a 
thin spherical shell of radius R, with a fixed potential distribution: (R,6,¢~) = Vo sin cosg. 


2.28. A thin spherical shell carries an electric charge with areal density o= opcos@. Calculate the 
spatial distribution of the electrostatic potential and the electric field, both inside and outside the shell. 


2.29. Use the variable separation method to solve the problem already addressed in Sec. 10: 
calculate the potential distribution both inside and outside of a thin spherical shell of radius R, separated 
with a very thin cut, along the central plane z = 0, into two halves, with voltage V applied between them 
— see Fig. 32. Analyze the solution; compare the field at the axis z, for z > R, with Eq. (218). 


Hint: You may like to use the following integral of a Legendre polynomial with odd index / = 1, 
3, 5...=2n — 1:80 


80 As a reminder, the double factorial (also called “semifactorial”) operator (!!) is similar to the usual factorial 
operator (!), but with the product limited to numbers of the same parity as its argument — in our particular case, of 
the odd numbers in the numerator, and even numbers in the denominator. 
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Panis =5-(5}( an she n= - eo 


T= 


n 


ote 


2.30. Calculate, up to the terms O(1/7’), the long-range electric field induced 
by a split and voltage-biased conducting sphere — similar to that discussed in Sec. 10 
(see Fig. 32) and in the previous problem, but with the cut’s plane at an arbitrary 
distance d < R from the center — see the figure on the right. 


2.31. Calculate the field distribution in the simple electrostatic lens that was the subject of 
Problem 1.9. 


Hint: You may like to use the fact that a general axially-symmetric solution of the Laplace 
equation in the oblate ellipsoidal coordinates (see Eqs. (2.59)-(2.60) of the lecture notes) may be 
represented in the following variable-separation form: 


p= 3 [p,?, (isinh a) +q,@,, (i sinh a)|P, (cos B), 


where #, are the Legendre polynomials (2.169) that are sometimes called the Legendre functions of the 
first kind, while @, are the Legendre functions of the second kind (briefly mentioned in Sec. 2.8) that 
may be defined by the following recurrence relations: 


ale)=Fin*£, (=A ()ALE)-L nal E)= 


an! £@,,(€)-", ,(6). 
n n 


2.32. A small conductor (in this context, usually called single- 
electron island) is placed between two conducting electrodes, with 
voltage V applied between them. The gap between the island and one of 
the electrodes is so narrow that electrons may tunnel quantum- 
mechanically through this “junction” — see the figure on the right. 
Neglecting thermal excitations, calculate the equilibrium charge of the 
island as a function of V. 


tunnel 
2 junction 


Hint: To solve this problem, you do not need to know much about ¢=0 
the quantum-mechanical tunneling between conductors, besides that such tunneling of an electron, 
followed by energy relaxation of the resulting excitations, may be considered as a single inelastic 
(energy-dissipating) event.’! At negligible thermal excitations, such an event takes place only if it 
decreases the total potential energy of the system. 


81 Strictly speaking, this statement, implying negligible quantum-mechanical coherence of the tunneling events, is 
correct only if the junction transparency is sufficiently low, so that its effective electric resistance is much higher 
than the fundamental quantum unit of resistance, Ra = mh/2e = 6.5 kQ (see, e.g., QM Sec. 3.2). However, this 
condition is satisfied in most experimental tunnel junctions. 
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g=V 


2.33. The system discussed in the previous problem is now 
generalized as the figure on the right shows. If the voltage V’ applied between 
the two bottom electrodes is sufficiently large, electrons can successively 
tunnel through two junctions of this system (called the single-electron 
transistor), carrying dc current between these electrodes. Neglecting thermal 
excitations, calculate the region of voltages V and V’ where such a current is 
fully suppressed (Coulomb-blocked). o=V' Rg ¢=0 


"island" 


junctions 


2.34. Use the charge image method to calculate the full surface charges induced in the plates of a 
very broad, voltage-unbiased plane capacitor of thickness D by a point charge qg separated from one of 
the electrodes by distance d. Suggest at least one alternative method to obtain the same result. 


2.35. Use the charge image method to calculate the potential energy of the electrostatic 
interaction between a point charge placed in the center of a spherical cavity that had been carved inside 
a grounded conductor, and the cavity’s walls. Looking at the result, could it be obtained in a simpler 
way (or ways)? 


2.36. Use the method of charge images to find the Green’s R 
function of the system shown in the figure on the right, where the ~ 
bulge on the conducting plane has the shape of a semi-sphere of “é 
radius R. 


2.37. Use the spherical inversion expressed by Eq. (198), to develop an iterative method for a 
more and more precise calculation of the mutual capacitance between two similar metallic spheres of 
radius R, with their centers separated by distance d > 2R. 


2.38." A conducting sphere of radius R), carrying electric charge Q, is 
placed inside a spherical cavity of radius R2 > R1, carved inside another metal. 
Calculate the electric force exerted on the sphere if its center is displaced by a 
small distance 6 << Rj, Ro — R; from that of the cavity — see the figure on the 
right. 


2.39. Within the simple models of the electric field screening in conductors, discussed in Sec. 
2.1, analyze the partial screening of the electric field of a point charge q by a plane conducting film of 
constant thickness t << A, where / is (depending on charge carrier statistics) either the Debye or the 
Thomas-Fermi screening length — see, respectively, Eqs. (8) or (10). Assume that the distance d between 
the charge and the film is much larger than ¢. 


2.40. It is sometimes convenient to use representations of the Green’s functions as series of the 
Legendre polynomials. Derive such representation for the function expressed by Eq. (204). 
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2.41. Use the result of the previous problem to confirm the solution (197)-(198) of the problem 
illustrated in Fig. 29: a grounded conducting sphere of radius R, and a point charge q located at distance 
d> R from its center. 


2.42. Suggest a convenient definition of the Green’s function for 2D electrostatic problems, and 
calculate it for: 


(i) the unlimited free space, and 
(ii) the free space above a conducting plane. 


Use the latter result to re-solve Problem 21. 


2.43. A conducting plane located at y = 0 is separated into two parts with a very narrow, straight 
cut along the z-axis, and voltage V is applied between the resulting half-planes, as shown in the figure 
below. Use the Green’s function method to find the distribution of the electrostatic potential and the 
electric field everywhere in the space. Compare the result with Eq. (83). In hindsight, could the problem 
be solved in an even simpler way? 


y 


2.44. Use the last result of Problem 42 and one of the conformal mappings discussed in Sec. 4 to 
find one more solution of Problem 18. 


2.45. Calculate the 2D Green’s functions for the free spaces: 


(i) outside a round conducting cylinder, and 
(ii) inside a round cylindrical hole in a conductor. 


2.46. Solve Problem 17(1) using the Green’s function method. 


2.47. Solve the 2D boundary problem that was discussed in Sec. 11 (Fig. 34), using: 


(i) the finite difference method with a finer square mesh: h = a/3, and 
(ii) the variable separation method. 


Compare the results at the mesh points, and comment. 
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Chapter 3. Dipoles and Dielectrics 


In contrast to conductors, the motion of charges in dielectrics is restricted to the atom/molecule 
interiors, so that the electric polarization of these materials by an external field takes a different form. 
This issue is the main subject of this chapter, but in preparation for its analysis, we have to start with a 
general discussion of the electric field induced by spatially-restricted systems of charges. 


3.1. Electric dipole 


Let us consider a localized system of charges, of a linear scale a, and derive a simple 
approximate expression for the electrostatic field induced by the system at a distant point r. For that, let 
us select a reference frame with the origin either somewhere inside the system, or at a distance of the 
order of a from it (Fig. 1). 


Fig. 3.1. Deriving the approximate expression 
for the electrostatic field of a localized system 
of charges at a distant point (r >>r’~ a). 


Then positions of all charges of the system satisfy the following condition: 
Cr, (3.1) 


Using this condition, we can expand the general expression (1.38) for the electrostatic potential ¢(r) of 
the system into the Taylor series in small parameter r’. For any function of type f(r — r’), the expansion 
may be represented as! 


eh 1 F Ei Od en 
fr-r)= f(r) ay ar Of ar ar, mM (3.2) 


Applying this formula to the fraction 1/|r — r’| in Eq. (1.38) (i.e. essentially to the free-space Green’s 
function), we get the so-called multipole expansion of the electrostatic potential: 


1 (1 1< 1 2 
g(r) = f O+— iP) sary Derit Gy + - (3.3) 
whose r-independent parameters are defined as follows: 


O= [el)a’r, Pp; =[plr')r/a’r’, G, [ e@')Br,7,/-r76, )aer'. (3.4) 


! See, e.g., MA Eq. (2.11b). 
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(Indeed, the two leading terms of the expansion (2) may be rewritten in the vector form f(r) — r’-Vf(r), 
and the gradient of such a spherically-symmetric function f(r) = 1/r is just n,df/dr, so that 


PSs gan o(2)atar (3.5) 


jr—r' r “adr\r) or r 


immediately giving the two first terms of Eq. (3). The proof of the third, guadrupole term in Eq. (3) is 
similar but a bit longer, and is left for the reader’s exercise.) 


Evidently, the scalar parameter O in Eqs. (3)-(4) is just the total charge of the system. The 
constants p; may be considered as Cartesian components of the following vector: 


called the system’s electric dipole moment, and Gj are Cartesian elements of a tensor — system’s electric 


quadrupole moment. If O # 0, all higher terms on the right-hand side of Eq. (3), at large distances (1), 
are just small corrections to the first one, and in many cases may be ignored. However, the net charge of 
many systems is exactly zero, the most important examples being neutral atoms and molecules. For such 
neural systems, the second (dipole) term in Eq. (3) is, most frequently, the leading one. Such systems are 
called electric dipoles. Due to their importance, let us rewrite the expression for the dipole term in three 
other, mathematically equivalent forms: 


0, += =, (3.7) 


that are more convenient for some applications. Here @ is the angle between the vectors p and r, and in 
the last (Cartesian) representation, the z-axis is directed along the vector p. Fig. 2a shows equipotential 
surfaces of the dipole field — or rather their cross-sections by any plane in which the vector p resides. 


(b) 


Fig. 3.2. (a) The equipotential surfaces and (b) the electric field lines of a dipole. (Panel (b) 
adapted from http://en.wikipedia.org/wiki/Dipole under the GNU Free Documentation License.) 
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The simplest example of a system whose field, at large distances, approaches the dipole field (7), 
is two equal but opposite point charges (“poles”), +g and —q, with the radius-vectors, respectively, ri. 
and r_: 


p(r)=(+q)5(r-r,)+(-g)o(r -r_). (3.8) 
For this system (sometimes called the physical dipole), Eq. (4) yields 
p=(+g)r, +(-@r_ =r, -r_)= 4a, (3.9) 


where a is the vector connecting the points r. and r,. Note that in this case (and indeed for all systems 
with QO = 0), the dipole moment does not depend on the choice of the reference frame’s origin. 


A less trivial example of a dipole is a conducting sphere of radius R in a uniform external electric 
field Eo. As a reminder, its field was calculated in Sec. 2.8, and its result is expressed by Eq. (2.176). 
The first term in the parentheses of that relation describes just the external field (2.173), so that the field 
of the sphere itself (i.e. that of the surface charge induced by Epo) is given by the second term: 
Ek 
¢, =—— cos 0. (3.10) 


r 


Comparing this expression with the second form of Eq. (7), we see that the sphere has an induced dipole 
moment 

p =476,E,R’. (3.11) 
This is an interesting example of a virtually pure dipole field: at all points outside the sphere (7 > R), the 
field has neither a quadrupole moment nor any higher moments. 


Other examples of dipole fields are given by two more systems discussed in Chapter 2 — see Eqs. 
(2.215) and (2.219). Those systems, however, do have higher-order multipole moments, so for them, Eq. 
(7) gives only the long-distance approximation. 


Now returning to the general properties of the dipole field (7), let us calculate its major 
characteristics. First of all, we may use Eq. (7) to calculate the electric field of a dipole: 


E, =-V¢, = uP |= —v{ 28 (3.12) 


2 
ATE, r 47é, la 


This differentiation is easiest in the spherical coordinates, using the well-known expression for the 
gradient of a scalar function in these coordinates? and taking the z-axis parallel to the dipole moment p. 
From the last form of Eq. (12), we immediately get 

2 
1 3r(r-p)-pr (3.13) 


5 


E,= P (2n, cos +n, sin6) = 


Ane r° Aré, r 


Fig. 2b above shows the electric field lines given by Eqs. (13). The most important features of this result 
are a faster drop of the field’s magnitude (Ey « 1/r’, rather than E « 1/r° for a point charge), and the 
change of the signs of its radial component as a function of the polar angle @ € [0, z]. 


2 See, e.g., MA Eq. (10.8) with d/dv= 0. 
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Next, let us use Eq. (1.55) to calculate the potential energy of interaction between a dipole and 
an external electric field. Assuming that this field does not change much at distances of the order of a 
(Fig. 1), we may expand its potential ¢.,,(1r) into the Taylor series, and keep only two leading terms: 


U =| pip (0)d*r = | p(r)[b.n (0) +r Vou (O)]d*r = OG. (0)-P- Ena (3.14) 


The first term is the potential energy the system would have if it were just a point charge. If the net 
charge Q is zero, that term disappears, and the leading contribution is due to the dipole moment: 
U=-p-E,,, for p=const. (3.15a) 
Note that this result is only valid for a fixed dipole, with p independent of E.,;. In the opposite limit, 
when the dipole is induced by the field, 1.e. p «< E.xt (you may have one more look at Eq. (11) to see an 
example of such a proportionality), we need to start with Eq. (1.60) rather than Eq. (1.55), getting 


1 


Gem DH for px E,,,. (3.15b) 


ext? 
In particular, combining Eqs. (13) and Eq. (15a), we may get the following important formula for 
the interaction of two independent dipoles: 


1 Pp ‘por —3(r-p, )(T-p,) _ 1 P\yPox + PiyPry -2Pi,Po, (3.16) 


Aré, e Aré, a 


int 


where r is the vector connecting the dipoles, and the z-axis is directed along this vector. It is easy to 
prove (this exercise is left for the reader) that if the magnitude p of each dipole moment is fixed (the 
approximation valid, in particular, for weak interaction of so-called polar molecules), this potential 
energy reaches its minimum at the parallel orientation of the dipoles along the line connecting them. 
Note also that in this case, Uin is proportional to 1/r°. On the other hand, if each moment p has a random 
value plus a component due to its polarization by the electric field of its counterpart: Api. ¢ Ep, 0 1/r’, 
their average interaction energy (which may be calculated from Eq. (16) with the additional factor 2) is 
always negative and is proportional to 1/r°. Such negative potential describes, in particular, the long- 
range, attractive part (the so-called London dispersion force) of the interaction between electrically 
neutral atoms and molecules.? 


According to Eqs. (15), the electric field should “try” to reach the minimum of U by aligning the 
dipole vector’s direction with its own. The direct quantitative description of this effect is the torque tT 
exerted by the field. The simplest way to calculate it is to sum up all the elementary torques dt = rxdF ext 
= rxEex(r) o(r)d* r exerted on all elementary charges of the system: 


t= [rxE,, ()p(r)d*r x pxE..(0), (3.17) 


where at the last step, the spatial dependence of the external field E.,:(r) was again neglected. This 
dependence cannot, however, be ignored at the calculation of the total force exerted by the field on the 
dipole (with O = 0). Indeed, Eqs. (15) shows that if the field is constant, the dipole’s energy is 


3 Several calculations of this force, using various models, are described in the QM and SM parts of this series. 
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independent of its spatial location and hence the net force is zero. However, if the field has a non-zero 
gradient, a total force does appear; for a field-independent dipole, 


F=-VU=V(p-E,,,.), (3.18) 


where the derivative has to be taken at the dipole’s position (in our notation, at r= 0). If the dipole that 
is being moved in a field retains its magnitude and orientation, then the last formula is equivalent to* 


F=(p-V)E,,,- (3.19) 
Alternatively, the last expression may be obtained similarly to Eq. (14): 
F =| p@E,,(r)d*r = | p@E..(0)+ VE. ]d°r = OE, (0) + (PVE - (3.20) 


Finally, let me add a note on the so-called coarse-grain model of the dipole. The dipole 
approximation explored above is asymptotically correct only at large distances, r >> a. However, for 
some applications (including the forthcoming discussion of the molecular field effects in Sec. 3) it is 
beneficial to have an expression that might be formally used everywhere in space, though maybe 
without exact details at r ~ a, giving the correct result for the space average of the electric field, 


= i 7 
Key |e4 r, (3.21) 


where V is a regularly-shaped volume much larger than a’, for example, a sphere of a radius R >> a, 
with the dipole at its center. For the field Ey given by Eq. (13), such an average is zero. Indeed, let us 
consider the Cartesian components of that vector in a reference frame with the z-axis directed along the 
vector p. Due to the axial symmetry of the field, the averages of the components FE, and £, vanish. Let 
us use Eq. (13) to spell out the “vertical” component of the field (parallel to the dipole moment vector): 


1 
E, =E,-? = + (2m, -pcos@—m, -psin @)= 7" (2cos? 6 —sin? 0). (3722) 


p Anéyr ; 


Integrating this expression over the whole solid angle Q = 4z7, at fixed r, using a convenient variable 
substitution cos 0= é, we get 


a a +1 
fz.ao - dal E, sin 10 = ee [le cos? @—sin? @)sin 10 = ee ~1)dé=0. (3.23) 


On the other hand, the exact electric field of an arbitrary charge distribution, with the total 
dipole moment p, obeys the following equality: 
1 427 


E(r)d*r =--P = 3.24 
JE) ac 4a vee 


where the integration is over any sphere containing all the charges. A proof of this formula for the 
general case requires a straightforward, but somewhat tedious integration (which is, therefore, left for 
the reader’s exercise). The origin of Eq. (24) is illustrated in Fig. 3 on the example of the physical 


4 The equivalence may be proved, for example, by using MA Eq. (11.6) with f = p = const and g = E.x,, taking 
into account that according to the general Eq. (1.28), VxEext= 0. 
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dipole, i.e. the system of two equal but opposite charges — see Eqs. (8)-(9). The zero average (23) of the 
dipole field (13) does not take into account the contribution from the region between the charges (where 
Eq. (13) is not valid), where the field is directed mostly against the dipole vector (9). 


Fig. 3.3. A sketch illustrating the origin 
of Eq. (24) for a physical dipole. 


So, in order to be used as a reasonable coarse-grain model, Eq. (13) may be modified as follows: 


? [epic Pe “= pat), (3.25) 


Are, a 3 


cg 


with the average (21) satisfying Eq. (24). Evidently, such a modification does not change the field at 
large distances r >> a, i.e. in the region where the expansion (3), and hence Eq. (13), are valid. 


3.2. Dipole media 


Now let us generalize Eq. (7) to the case of several (possibly, many) dipoles p; located at 
arbitrary points r;. Using the linear superposition principle, we get 


r-r. 
=p, — (3.26) 


ANE) “5 jr-r,| 


4 (tr) = 


If our system (medium) contains many similar dipoles, distributed in space with density n(r), we may 
approximate the last sum with a macroscopic potential, which is the average of the genuine 
(“microscopic”) potential (26) over a local volume much larger than the distance between the dipoles, 
and as a result, is given by the integral 


$()= fre): d*r', with P(r)=n(r)p, (3.27) 


7 of 


where the vector P(r), called the electric polarization, has the physical meaning of the net dipole 
moment per unit volume. (Note that by its definition, P(r) is also a “macroscopic” field.) 


Now comes a very impressive trick, which is the basis of all the theory of “macroscopic” 
electrostatics (and eventually, “macroscopic” electrodynamics). Just as was done at the derivation of Eq. 
(5), Eq. (27) may be rewritten in the equivalent form 


1 Ree 
oO areal V Fas (3.28) 
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where V’ means the del operator (in this particular case, the gradient) acting in the “source space” of 
vectors r’ The right-hand side of Eq. (28), applied to any volume V limited by a closed surface S, may 
be integrated by parts to give> 


A) 


d,(r) = 


ae 


a — [ay (3.29) 
4ne,1 |r—r'| 


If the surface does not carry an infinitely dense (d-functional) sheet of additional dipoles,® or it is just 
very distant, the first term on the right-hand side is negligible. Now comparing the second term with the 
basic equation (1.38) for the electric potential, we see that this term may be interpreted as the field of 
certain effective electric charges with density 


Figure 4 illustrates the physics of this key relation for a cartoon model of a simple multi-dipole 
system: a layer of uniformly-distributed two-point-charge units oriented perpendicular to the layer 
surface. (In this case V-P = dP/dx.) One can see that the p.¢ defined by Eq. (30) may be interpreted as 
the density of the uncompensated surface charges of polarized elementary dipoles. 


Fig. 3.4. The spatial distributions of the 
polarization and effective charges in a layer of 
similar elementary dipoles (schematically). 


Next, from Sec. 1.2, we already know that Eq. (1.38) is equivalent to the inhomogeneous 
Maxwell equation (1.27) for the electric field, so that the macroscopic electric field of the dipoles 
(defined as Ey =—V du, where gy is given by Eq. (27)) obeys a similar equation, with the effective charge 
density (30). 


Now let us consider a more general case when a system, besides the compensated charges of the 
dipoles, also has certain stand-alone charges — not parts of the dipoles already taken into account in the 
polarization P. As was discussed in Sec. 1.1, if we average this charge over the inter-point-charge 
distances, i.e. approximate it with a continuous “macroscopic” density p(r), then its macroscopic 


5 To prove this (almost evident) formula strictly, it is sufficient to apply the divergence theorem given by MA Eq. 
(12.2), to the vector function f = P(r’)/| r—r’|, in the “source space” of radius-vectors r’. 

6 Just like in the case of Eq. (1.9), we may always describe such a dipole sheet using the second term in Eq. (29), 
by including a delta-functional part into the polarization distribution P(r’). 
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electric field also obeys Eq. (1.27), but with the stand-alone charge density. Due to the linear 
superposition principle, for the total macroscopic field E of these charges and dipoles, we may write 


1 1 
V-E=—(p+ p..)=—(p-V-P). (3.31) 
éo éo 
This is already the main result of the “macroscopic” electrostatics. However, it is evidently 
tempting (and very convenient for applications) to rewrite Eq. (31) in a different form by carrying the 
dipole-related term of this equality over to its left-hand side. The resulting formula is called the 


macroscopic Maxwell equation for D: 
a3 


where D(r) is a new “macroscopic” field, called the electric displacement (in some older texts, “electric 


induction’), defined as’ 


The comparison of Eqs. (32) and (1.27) shows that D (or more strictly, the fraction D/g&) may be 
interpreted as the “would-be electric field” that would be created by stand-alone charges in the absence 
of the dipole medium polarization. If should be distinguished from the E participating in Eqs. (31) and 
(33), i.e. from the genuine electric field, if averaged over a spatial scale of the order of the distance 
between elementary charges and dipoles. 


In order to get an even better gut feeling of the fields E and D, let us first rewrite the 
macroscopic Maxwell equation (32) in the integral form. Applying the divergence theorem to an 
arbitrary volume V limited by surface S, we get the following macroscopic Gauss law: 


fD,d’r=[pd’r=Q, (3.34) 


S V 


where Q is the stand-alone charge inside volume V. 


This general result may be used to find the boundary conditions for D at a sharp interface 
between two different dielectrics. (The analysis is applicable to a dielectric/free-space boundary as 
well.) For that, let us apply Eq. (34) to a flat pillbox formed at the interface (see the solid rectangle in 
Fig. 5), which is sufficiently small on the spatial scales of the dielectric’s nonuniformity and the 
interface’s curvature, but still containing many elementary dipoles. Assuming that the interface does not 
have stand-alone surface charges, we immediately get 


(D, ), =(D.)), (3.35) 


i.e. the normal component of the electric displacement has to be continuous. Note that a similar 
statement for the macroscopic electric field E is generally not valid, because the polarization vector P 
may have, and typically does have a leap at a sharp interface (say, due to the different polarizability of 


7 Note that according to its definition (33), the dimensionality of D in the SI units is different from that of E. In 
contrast, in the Gaussian units, the electric displacement is defined as D = E + 4zP, so that V-D = 47 (the 
relation QP. = —V-P remains the same as in SI units), and the dimensionalities of D and E coincide. This 
coincidence is a certain perceptional handicap because it is frequently convenient to consider the scalar 
components of E as generalized forces, and those of D as generalized coordinates (see Sec. 5 below), and it is 
somewhat comforting to have their dimensionalities different, as they are in the SI units. 
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the two different dielectrics), providing a surface layer of the effective charges (30) — see again the 
example shown in Fig. 4. 


Fig. 3.5. Deriving the boundary conditions at an interface 
between two dielectrics, using a Gauss pillbox (shown as 
a solid-line rectangle) and a circulation contour (dashed- 
line rectangle). Here n and t are the unit vectors that are, 
respectively, normal and tangential to the interface. Note 
that due to the leap of polarization, the field lines are 
generally “refracted” at the interface — see Fig. 11b for an 
example. 


However, we still can make an important statement about the behavior of E at the interface. 
Indeed, the macroscopic electric fields defined by Eqs. (29) and (31), are evidently still potential ones, 
and hence obey the macroscopic Maxwell equation similar to Eq. (1.28): 


Integrating this equality along a narrow contour stretched along the interface (see the dashed rectangle 
in Fig. 5), we get 


(E,), =(E,),. (3.37) 


Note that this condition is compatible with (and may be derived from) the continuity of the macroscopic 
electrostatic potential ¢ related to the macroscopic field E by the relation similar to Eq. (1.33), E=—V4@, 
at each point of the interface: ¢; = gp. 


In order to see how these boundary conditions work, let us consider the simple problem shown in 
Fig. 6. A very broad plane capacitor, with zero voltage between its conducting plates (as may be 
enforced, for example, by their connection with an external wire), is partly filled with a material with a 
uniform polarization Po,’ oriented normal to the plates. Let us calculate the spatial distribution of the 
fields E and D, and also the surface charge density of each conducting plate. 


Z dsp Se \ 
di) a P, 4 4 j Fig. 3.6. A simple system whose 


sa analysis requires Eq. (35). 
RRM 


Due to the symmetry of the system, the vectors E and D are both normal to the plates and do not 
depend on the position in the capacitor’s plane, so we can limit the fields’ analysis to the calculation of 
their z-components E(z) and D(z). In this case, the Maxwell equation (32) is reduced to dD/dz = 0 inside 
each layer (but not at their border!), so that within each of them, D is constant — say, some D, in the 
layer with P = Po, and certain D> in the free-space layer, where P = 0. As a result, according to Eq. (33), 
the (macroscopic) electric field inside each layer is also constant: 


8 As will be discussed in the next section, this is a good approximation for the so-called electrets, and also for 
hard ferroelectrics in not very high electric fields. 
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D, =6,£,+2, D, =&)E,. (3.38) 


Since the voltage between the plates is zero, we may also require the integral of F, taken along a path 
connecting the plates, to vanish. This gives us one more relation: 


E,d, +E,d, =0. (3.39) 


Still, the three equations (38)-(39) are insufficient to calculate the four fields in the system (£12 and 
D2). The decisive help comes from the boundary condition (35): 


DD (3.40) 


(Note that it is valid because the layer interface does not carry stand-alone electric charges, even though 
it has a polarization surface charge, whose areal density may be calculated by integrating Eq. (30) 
across the interface: oO = Po. Note also that in our simple system, Eq. (37) is identically satisfied due to 
the system’s symmetry, and hence does not give any additional information.) 


Now solving the resulting system of four equations (38)-(40), we readily get 
Bebe ~ pe oh a, d 
é,d,+d,’ OB eg. d,+d, 


b= D,=D,=D=P, (3.41) 
The areal densities of the electrode surface charges may now be readily calculated by the integration of 


Eq. (32) across each surface: 
d 
o,=-0, =D=P, — (3.42) 
d,+d, 

Note that due to the spontaneous polarization of the lower layer’s material, the capacitor plates 
are charged even in the absence of voltage between them, and that this charge is a function of the second 
electrode’s position (d2).? Also notice a substantial similarity between this system (Fig. 6), and the one 
whose analysis was the subject of Problem 2.6. 


3.3. Polarization of dielectrics 


The general relations derived in the previous section may be used to describe the electrostatics of 
any dielectrics — materials with bound electric charges (and hence with negligible dc electric 
conduction). However, to form a full system of equations necessary to solve electrostatics problems, 
they have to be complemented by certain constitutive relations between the vectors P and E.!° 


In most materials, in the absence of an external electric field, the elementary dipoles p either 
equal zero or have a random orientation in space, so that the net dipole moment of each macroscopic 
volume (still containing many such dipoles) equals zero: P = 0 at E = 0. Moreover, if the field changes 


9 This effect is used in most modern microphones. In such a device, the sensed sound wave’s pressure bends a 
thin conducting membrane playing the role of one of the capacitor’s plates, and thus modulates the thickness (in 
Fig. 6, d>) of the air gap adjacent to the electret layer. This modulation produces proportional variations of the 
charges (42), and hence some electric current in the wire connecting the plates, which is picked up by readout 
electronics. 

10 Tn the problem solved at the end of the previous section, the role of such relation was played by the equality Po 
= const. 
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are sufficiently slow, most materials may be characterized by a unique dependence of P on E. Then 
using the Taylor expansion of function P(E), we may argue that in relatively low electric fields the 
function should be well approximated by a linear dependence between these two vectors. Such 
dielectrics are called /inear (or “simple’’). In an isotropic media, the coefficient of proportionality should 
be just a scalar. In the SI units, this scalar is defined by the following relation: 


P=y7.6,E, (3.43) 


with the dimensionless constant y, called the electric susceptibility. However, it is much more common 
to use, instead of y., another dimensionless parameter, !! 


which is sometimes called the “relative electric permittivity”, but much more often, the dielectric 
constant. This parameter is very convenient, because combining Eqs. (43) and (44), 


P=(x-l)e,E. (3.45) 
and then plugging the resulting relation into the general Eq. (33), we get simply 


D = xe, E, or D=6éE, (3.46) 
where another popular parameter, |? 


f=Ks, = (1 + 7. ey. (3.47) 


é is called the electric permittivity of the material.!? Table 1 gives the approximate values of the 
dielectric constant for several representative materials. 


In order to understand the range of these values, let me discuss (briefly and rather superficially!*) 
the two simplest mechanisms of electric polarization. The first of them is typical for liquids and gases of 
polar atoms/molecules, which have their own, spontaneous dipole moments p. (A typical example is the 
water molecule HO, with the negative oxygen ion offset from the line connecting two positive 
hydrogen ions, thus producing a spontaneous dipole moment p = ea, with a = 0.38x10"'°m ~ rg.) In the 
absence of an external electric field, the orientation of such dipoles may be random, with the average 
polarization P = n(p) equal to zero — see the top panel of Fig. 7a. 


661.99 


'l Tn older physics literature, the dielectric constant is often denoted by the letter ¢ (with the index “r’’ meaning 
“telative”), while in electrical engineering publications, its notation is frequently K. 

12 The reader may be perplexed by the use of three different but uniquely related parameters (y., k= 1+ 7, and € 
= K€) for the description of just one scalar property. Unfortunately, such redundancy is typical for physics, whose 
different sub-field communities have different, well-entrenched traditions. 

13 In the Gaussian units, y. is defined by the following relation: P= y.E, while ¢ is defined just as in the SI units, 
D = &K. Because of that, in the Gaussian units, the constant ¢is dimensionless and equals (1 + 4zy,). As a result, 
EGaussian = (€/€)s1 = K, SO that (7-)Gaussian = (Ve)si/4%, Sometimes creating confusion between the numerical values of 
the latter parameter — dimensionless in both systems. 

14 While I believe this discussion is very useful, it is quantitatively valid only for relatively sparse media, with low 
concentration (n << 1/a*) of elementary atomic/molecular dipoles of size scale a. Indeed, in some condensed 
materials, with na’ ~ 1, even the notion of the dipole moment p with a single atomic cell is ambiguous. 
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Table 3.1. Dielectric constants of a few representative (and/or practically important) dielectrics 


Material K 
Air (at ambient conditions) 1.00054 
Teflon (polytetrafluoroethylene, [C2F4]n) Zl 
Silicon dioxide (amorphous) 3.9 
Glasses (of various compositions) 3.7-10 
Castor oil 4.5 
Silicon 11.7 
Water (at 100°C) 55.3 
Water (at 20°C) 80.1 
Barium titanate (BaTiO; , at 20°C ) ~1,600 


@) Anisotropic materials, such as silicon crystals, require a susceptibility tensor to give an exact description of the 
linear relation of the vectors P and E. However, most important crystals (including Si) are only weakly anisotropic, so 
they may be reasonably well characterized with a scalar (angle-average) susceptibility. 


A relatively weak external field does not change the magnitude of the dipole moments 
significantly, but according to Eqs. (15a) and (17), tries to orient them along the field, creating a non- 
zero vector average (p) directed along the vector E,,, where E,, is the microscopic field at the point of 
the dipole’s location — cf. two panels of Fig. 7a. If the field is not two high (p(Em) << kp7), the induced 
average polarization (p) is proportional to E,,. If we write this proportionality relation in the following 


traditional form, 


where ais called the atomic (or, sometimes, “molecular’’) polarizability, this means that a is positive. If 
the concentration 1 of such elementary dipoles is low, the contribution of their own fields into the 


Chapter 3 


(a) (b) 


(p) - ak, ? 


Fig. 3.7. Crude cartoons of two 
mechanisms of the induced 
electrical polarization: (a) a partial 
ordering of spontaneous elementary 
dipoles, and (b) an elementary 
dipole induction. The upper two 
panels correspond to E = 0, and 
the lower two panels, to E # 0. 


(3.48) 
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microscopic field acting on each dipole is negligible, and we may identify E,, with the macroscopic field 
E. As a result, the second of Eqs. (27) yields 


P=n/(p)=anE. (3.49) 


Comparing this relation with Eq. (45), we get 
Pea ean (3.50) 
éo 


so that «> 1 (Le. Ye = an/é& > 0). Note that at this particular polarization mechanism (illustrated on the 
lower panel of Fig. 7a), the thermal motion “tries” to randomize the dipole orientation, i.e. reduce its 
ordering by the field, so that we may expect a, and hence vy. = « — | to increase as temperature T is 
decreased — the so-called paraelectricity. Indeed, the basic statistical mechanics!5 shows that in this 
case, the electric susceptibility follows the so-called Curie law y.« 1/T. 


The materials of the second, much more common class consist of non-polar atoms without 
intrinsic spontaneous polarization. A crude classical image of such an atom is an isotropic cloud of 
negatively charged electrons surrounding a positively charged nucleus — see the top panel of Fig. 7b. 
The external electric field shifts the positive charge in the direction of the vector E, and the negative 
charges in the opposite direction, thus creating a similarly directed average dipole moment (p).!° At 
relatively low fields, this average moment is proportional to E, so that we again arrive at Eq. (48), with 
a> 0, and if the dipole concentration n is sufficiently low, also at Eq. (50), with «— 1 > 0. So, the 
dielectric constant is larger than 1 for both polarization mechanisms — please have one more look at 
Table 1. 


In order to make a crude but physically transparent estimate of the difference « — 1, let us 
consider the following toy model of a non-polar dielectric: a set of similar conducting spheres of radius 
R, distributed in space with a low density n << 1/R°. At such density, the electrostatic interaction of the 
spheres is negligible, and we can use Eq. (11) for the induced dipole moment of a single sphere. Then 
the polarizability definition (48) yields a= 4@R’, so that Eq. (50) gives 


x =14+40R'n. (3.51) 


Let us use this result for a crude estimate of the dielectric constant of air at the so-called ambient 
conditions, meaning the normal atmospheric pressure P= 1.013x10° Pa and temperature T = 300 K. At 
these conditions the molecular density n may be, with a few-percent accuracy, found from the well- 
known equation of state of an ideal gas:!7n = P/kgT ~ (1.013x10°)/(1.38x107°x300) ~ 2.45x107 m®. 
The molecule of the air’s main component, N2, has a van-der-Waals radius!® of 1.55x107° m. Taking 
this radius for the R of our crude model, we get 7 = «—1~ 1.15x10°. Comparing this number with the 


!5 See, e.g., SM Chapter 2. 

'6 Realistically, these effects are governed by quantum mechanics, so the average here should be understood not 
only in the statistical-mechanical but also (and mostly) in the quantum-mechanical sense. Because of that, for 
non-polar atoms, @ is typically a very weak function of temperature, at least on the usual scale T ~ 300K. 

'7 If needed, see, e.g., SM Secs. 1.4 and 3.1. 

18 Such radius is defined by the requirement that the volume of the corresponding sphere, if used in the van-der- 
Waals equation (see, e. g., SM Sec. 4.1), gives the best fit to the experimental equation of state n =n (P, 7). 
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first line of Table 1, we see that the model gives a surprisingly reasonable result: to get the experimental 
value, it is sufficient to decrease the effective R of the sphere by just ~30%, to ~1.2x107'° m.!9 


This result may encourage us to try using Eq. (51) for a larger density n. For example, as a crude 
model for a non-polar crystal, let us assume that the conducting spheres form a simple cubic lattice with 
the period a = 2R (i.e., the neighboring spheres virtually touch). With this, n = 1/a° = 1/8R° and Eq. (44) 
yields «= 1 + 47/8 ~ 2.5. This estimate provides a reasonable semi-qualitative explanation for the 
values of « listed in a few middle rows of Table 1. However, at such small distances, the electrostatic 
dipole-dipole interaction should be already essential, so that this simple model cannot even 
approximately describe the values of « much larger than 1, listed in the last rows of the table. 


Such high values may be explained by the so-called molecular field effect: each elementary 
dipole is polarized not only by the external field, as Eq. (49) assumes, but by the field of neighboring 
dipoles as well. Ottavino-Fabrizio Mossotti in 1850 and (but almost 30 years later) Rudolf Clausius 
suggested what is now known, rather unfairly, as the Clausius-Mossotti formula, which describes this 
effect reasonably well in many non-polar materials.° In our notation, it reads?! 

k-1_ an an/€, 


—., so that K =1+ 


= aid 
Roo. 384 G52) 


l-an/3e, 


If the dipole density is low in the sense n << &/a, this relation is reduced to Eq. (50) corresponding to 
independent dipoles. However, at higher dipole density, both «and y. increase faster and tend to infinity 
as the density-polarizability product approaches some critical value n,, in the simple Clausius-Mossotti 
model equal to 3 &/a.2? This means that the zero-polarization state becomes unstable even in the absence 
of an external electric field. 


This instability is a linear-theory (i.e. low-field) manifestation of a substantially nonlinear effect 
— the formation, in some materials, of spontaneous polarization even in the absence of an external 
electric field. Such materials are called ferroelectrics, and may be experimentally recognized by the 
hysteretic behavior of their polarization as a function of the applied (external) electric field — see Fig. 8. 
As the plots show, the polarization of a ferroelectric depends on the applied field’s history. For example, 
the direction of its spontaneous remnant polarization Pg may be switched by first applying, and then 
removing a sufficiently high field (larger than the so-called coercive field Ec — see Fig. 8) of the 
opposite orientation. The physics of this switching is rather involved; the polarization vector P of a 
ferroelectric material is typically constant only within each of spontaneously formed spatial regions 
(called domains), with a typical size of a few tenths of a micron, and different (frequently, opposite) 
directions of the vector P in adjacent domains. The change of the applied electric field results not in the 


19 As will be discussed in QM Chapter 6, for a hydrogen atom in its ground state, the low-field polarizability may 
be calculated analytically: a = (9/2)x4erg°, corresponding to our metallic-ball model with a close value of the 
effective radius: R = (9/2)'*rg = 1.65 rg 0.87x10°° m. 

20 Applied to the high-frequency electric field, with «replaced by the square of the refraction coefficient n at the 
field’s frequency (see Chapter 7), this formula is known as the Lorenz-Lorentz relation. 

21 T am leaving the approximate proof of Eq. (52), using Eq. (3.24), for the reader’s exercise. 

22 The Clausius-Mossotti formula does not give quantitatively correct results for most ferroelectric materials. For 
a review of modern approaches to the theory of their polarization, see, e.g., the paper by R. Resta and D. 
Vanderbilt in the review collection by K. Rabe, C. Ahn, and J.-M. Triscone (eds.), Physics of Ferroelectrics: A 
Modern Perspective, Springer, 2010. 
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switching of the direction of P inside each domain, but rather in a shift of the domain walls, resulting in 
the change of the average polarization of the sample. 


Fig. 3.8. The average polarization of soft 
and hard ferroelectrics as functions of the 
applied electric field (schematically). 


ferroelectric 


Depending on the ferroelectric’s material, temperature, and the sample’s geometry (a solid 
crystal, a ceramic material, or a thin film), the hysteretic loops may be rather different, ranging from a 
rather smooth form in the so-called soft ferroelectrics (which include most ferroelectric thin films) to an 
almost rectangular form in hard ferroelectrics — see Fig. 8. In low fields, soft ferroelectrics behave 
essentially as linear paraelectrics, but with a very high average dielectric constant — see the bottom line 
of Table 1 for such a classical material as BaTiO; (which is a soft ferroelectric at temperatures below T, 
= 120°C, and a paraelectric above this critical temperature). On the other hand, the polarization of a hard 
ferroelectric in the fields below its coercive field remains virtually constant, and the analysis of their 
electrostatics may be based on the condition P = Pr = const — already used in the problem discussed in 
the end of the previous section.23 This condition is even more applicable to the so-called electrets — 
synthetic polymers with a spontaneous polarization that remains constant even in very high electric 
fields. 


Some materials exhibit even more complex polarization effects, for example, antiferroelectricity, 
helielectricity, and (practically very valuable) piezoelectricity. Unfortunately, I do not have time for a 
discussion of these exotic phenomena in this course;24 the main reason I am mentioning them is to 
emphasize again that the constitutive relation P = P(E) is material-specific rather than fundamental. 
However, most insulators, in practicable fields, behave as linear dielectrics, so the next section will be 
committed to the discussion of their electrostatics. 


23 Due to this property, hard ferroelectrics, such as the lead zirconate titanate (PZT) and strontium bismuth 
tantalite (SBT), with high remnant polarization Pg (up to ~1 C/m’), may be used in nonvolatile random-access 
memories (dubbed either FRAM or FeRAM) - see, e.g., J. Scott, Ferroelectric Memories, Springer, 2000. In a 
cell of such a memory, binary information is stored in the form of one of two possible directions of spontaneous 
polarization at E = 0 (see Fig. 8). Unfortunately, the time of spontaneous depolarization of ferroelectric thin films 
is typically well below 10 years — the industrial standard for data retention in nonvolatile memories, and this time 
may be decreased even more by “fatigue” from the repeated polarization recycling at information recording. Due 
to these reasons, the industrial production of FRAM is currently just a tiny fraction of the nonvolatile memory 
market, which is dominated by floating-gate memories — see, e.g., Sec. 4.2 below. 

24 For detailed coverage of ferroelectrics, I can recommend the encyclopedic monograph by M. Lines and A. 
Glass, Principles and Applications of Ferroelectrics and Related Materials, Oxford U. Press, 2001, and the 
recent review collection edited by K. Rabe et al., that was cited above. 
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3.4. Electrostatics of linear dielectrics 


First, let us consider the simplest but very important problem: how is the electrostatic field of a 
set of stand-alone charges of density p(r) modified if it is placed into a uniform linear dielectric medium 
that obeys Eq. (46) with a dielectric constant « constant in the whole region we are interested in. In this 
case, we may combine Eqs. (32) and (46) to write 


Vere, (3.53) 


% 


As a reminder, in the free space we had a similar equation (1.27), but with a different constant, & = é/x. 
Hence all the results discussed in Chapter 1 are valid inside a uniform linear dielectric, for the 
macroscopic field the E (and the corresponding macroscopic electrostatic potential ¢), if they are 
reduced by the factor of «> 1. Thus, the most straightforward result of the induced polarization of a 
dielectric medium is the electric field reduction. This is a very important effect, especially taking into 
account the very high values of « in such common dielectrics as water — see Table 1. Indeed, it is the 
reduction of the attraction between positive and negative ions (called, respectively, cations and anions) 
in water that enables their substantial dissociation and hence almost all biochemical reactions, which are 
the basis of the biological cell functions — and hence of the life itself. 


Let us apply this general result to the important particular case of the plane capacitor (Fig. 2.3) 
filled with a linear, uniform dielectric. Applying the macroscopic Gauss law (34) to a pillbox-shaped 
volume on the conductor surface, we get the following relation, 


o=D,=éE, Se (3.54) 
On 
which differs from Eq. (2.3) only by the replacement 4 — € = Ké. Hence, for a fixed field E,,, the 
charge density calculated for the free-space case should be increased by the factor of « — that’s it. In 
particular, this means that the capacitance (2.28) has to be increased by this factor: 


A_ 6A 
(§ oa a (3.55) 
d d 
(As a reminder, this increase of C by « has been already incorporated, without proof, into some 
estimates made in Secs. 2.1 and 2.2, to make them realistic.) 


If a linear dielectric is nonuniform, the situation is more complex. For example, let us consider 
the case of a sharp interface between two otherwise uniform dielectrics, free of stand-alone charges. In 
this case, we still may use Eq. (37) for the tangential component of the macroscopic electric field, and 
also Eq. (36), with D, = éE,, for its normal component, getting 


(3.56) 


Let us apply these boundary conditions, first of all, to consider how carving a slit of some width 
d and a much smaller thickness ¢ << d from inside a dielectric, changes an initially uniform electric field 
Eo, depending on its orientation — see Fig. 9. 
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B 
E, A D=D, E=E, 


D, =Ké,E, E=D/é, =«E, D=<¢,E=D,/« Big,73.9; Fields inside 
two narrow slits cut in 
a linear dielectric. 


First of all, intuition tells us that regardless of its orientation, a slit cannot change the field far 
from it; moreover, at t > 0, it cannot modify substantially even the field right outside its “major” 
(broader) surfaces. This conclusion may be supported either by direct calculations (see, e.g., the problem 
illustrated by Fig. 11 below), or by energy arguments: at t << d, any potential energy decrease due to the 
field change inside the slit’s volume (proportional to td) cannot compensate its increase in the outer 
volume proportional to d’. However, it may induce some local field changes — inside the slit, and even 
outside it, close to its “minor” surfaces. 


To calculate the inner field for case A, with the slit’s plane normal to the applied field, we may 
apply Eq. (56) to its major surfaces (shown horizontal), to prove that the vector D should be continuous. 
But according to Eq. (46), this means that in the free space inside the slit, the electric field should equal 
D/g, and hence be « times higher than the field Eo = D/xeé far from the slit. This field, and hence D, 
may be measured by a sensor placed inside the gap, so that the electric displacement is not an entirely 
mathematical construct.2> On the contrary, for case B, with the slit’s plane parallel to the initial field, 
we may apply Eq. (37) to the major (now, vertical) interfaces of the slit, to see that now the electric field 
E is continuous, while the electric displacement D = «KE inside the gap is a factor of « lower than its 
value in the dielectric. (Similarly to case A, any perturbations of the field uniformity, caused by the 
compliance with Eq. (56) at the minor surfaces, settle down at distances ~t from them.) 


For other problems with piecewise-constant ¢, with more complex geometries, we may need to 
apply the methods studied in Chapter 2. In particular, in the simplest cases we can select such a set of 
orthogonal coordinates that the electrostatic potential depends on just one of them. Consider, for 
example, two types of filling a plane capacitor with two different dielectrics — see Fig. 10. 


(a) (b) 


E ee 
: [4 é, Fig. 3.10. Plane capacitors filled 
with two different dielectrics. 


In case (a), the voltage V between the electrodes is the same for each part of the capacitor, telling 
us that at least far from the dielectric interface, the electric field is vertical, uniform, and constant (E = 
Vid). Hence the boundary condition (37) is satisfied even if such a distribution is valid near the surface 


25 Superficially, this result violates the boundary condition (37) at the vertical (“minor”) surfaces of the gap. This 
apparent contradiction is resolved by the fact the thin slit can deform the field both inside and outside it, at 
distances of the order of ¢t around these interfaces, but not far beyond them, so that the above relations for E and D 
are valid at most of the slit area. 
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as well, i.e. at any point of the system. The only effect of different values of ¢ in the two parts is that the 
electric displacement D = éE and hence electrodes’ surface charge density o= D are different in them. 
Thus we can calculate the electrode charges Q). of the two parts independently, and then add up the 
results to get the total mutual capacitance 


c= OE 4, +6,4,). (3.57) 


Note that this formula may be interpreted as the total capacitance of two separate lumped capacitors 
connected (by wires) in parallel. This is natural, because we may cut the system along the dielectric 
interface, without any effect on the fields in either part, and then connect the corresponding electrodes 
by external wires, again without any effect on the system — besides very close vicinities of the 
capacitor’s edges, where the fringe 


Case (b) may be analyzed just as in the problem illustrated by Fig. 6, by applying Eq. (34) to a 
Gaussian pillbox with one lid inside the (for example) bottom electrode, and the other lid inside any of 
the layers. As a result, we see that D anywhere inside the system should be equal to the surface charge 
density o of the electrode, i.e. constant. Hence, according to Eq. (46), the electric field E inside each 
dielectric layer is also constant: in the top layer, it is FE) = D,/& = o/&, while in bottom layer, E2 = D2/& 
= o/&. Integrating the field EF across the whole capacitor, we get 


d,+d, d d 
V= [E(2)dz = Ed, + Ed, = (2 f a (3.58) 
0 é  &) 
so that the mutual capacitance per unit area 
-1 
da 
Sr Giga I (3.59) 
ADV ee Bs 


Note that this result is similar to the total capacitance of an in-series connection of two plane capacitors 
based on each of the layers. This is also natural because we could insert an uncharged, thin conducting 
sheet (rather than a cut as in the previous case) at the layer interface, which is an equipotential surface, 
without changing the field distribution in any part of the system. Then we could thicken the conducting 
sheet as much as we liked (and possibly shape its internal part into a thin wire), also without changing 
the fields in the dielectric parts of the system, and hence the capacitance. 


Proceeding to problems with more complex geometry, let us consider the system shown in Fig. 
11a: a dielectric sphere placed into an initially uniform external electric field Eo. According to Eq. (53) 
for the macroscopic electric field, and the definition of the macroscopic electrostatic potential, E = —V 4, 
the potential satisfies the Laplace equation both inside and outside the sphere, though not at its border. 
Due to the spherical symmetry of the dielectric sample, this problem invites the variable separation 
method in spherical coordinates, which was discussed in Sec. 2.8. From that discussion, we already 
know, in particular, the general solution (2.172) of the Laplace equation outside of the sphere. To satisfy 
the uniform-field condition at r > oo, we have to reduce this solution to 


Psp =-E,rcos0 +>" P,(cos6). (3.60) 
r 


T=1 


Inside the sphere, we can also use Eq. (2.172), but keeping only the radial functions finite at 7 > 0: 
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Pcp = > a,r'P,(cos8). (3.61) 
l=] 


Now, spelling out the boundary conditions (37) and (56) at r = R, we see that for all coefficients a; and 
b; with / > 2, we get homogeneous linear equations (just like for the conducting sphere discussed in Sec. 
2.8) that have only trivial solutions. Hence, all these terms may be dropped, while for the only surviving 
terms with /= 1, proportional to the Legendre polynomial A(cos@) = cos@, we get two equations: 

2 
EB, gy, —E,R+— =a,R. (3.62) 


R? 


Solving this simple system of linear equations for a, and b,, and plugging the result into Eqs. (60) and 
(61), we get the final solution of the problem: 


3 


1R 3 
— |cos0, cp = E 
e Prse PaO 


| ens eee ee rcosd. (3.63) 
r>R 0 K+ 


(b) 


| 


Fig. 3.11. A dielectric sphere in an initially uniform electric field: (a) the problem, and (b) the 
equipotential surfaces, as given by Eq. (63), for «= 3. 


Figure 11b shows the equipotential surfaces given by this solution, for a particular value of the 
dielectric constant « Note that according to Eq. (62), at r = R the dielectric sphere, just as the 
conducting sphere in a similar problem, produces (on top of the uniform external field) a pure dipole 
field, with the dipole moment 

— —1 
= =e =3y= 


p = 4aR° 
K+ K+2 


€)E,), where V = <= R’. (3.64) 


This is an evident generalization of Eq. (11), to which Eq. (64) tends at « — o. By the way, this 
property is common: for their electrostatic properties, conductors may be adequately described as 
dielectrics with «—> o, 


Another remarkable feature of Eqs. (63) is that the electric field and polarization inside the 
sphere are uniform, with R-independent values 
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=f 
E=-——§,, Dex«s,E=¢,—*-E,, P=D-«,E=36,~ 
K+2 k+2 K+2 


E,. (3.65) 


In the limit « > 1 (for example, the “sphere made of free space”, i.e. no sphere at all), the electric field 
inside it naturally tends to the external one, and its polarization vanishes. In the opposite limit « > , 
the electric field inside the sphere vanishes. Curiously enough, in this limit the electric displacement 
inside the sphere remains finite: D > 3 eEo. 


More complex problems with piecewise-uniform dielectrics also may be addressed by the 
methods discussed in Chapter 2, and hopefully, the reader will be able to use them to solve a few such 
problems offered in Sec. 6, on their own. Let me discuss just one of such problems because it exhibits a 
new feature of the charge image method that was discussed in Secs 2.9 (and is the basis of Green’s 
function approach — see Sec. 2.10). Consider the system shown in Fig. 12: a point charge near a 
dielectric half-space; it obviously parallels the system discussed in Sec. 2.9 — see Fig. 2.26. 


this point “sees” 
charges 
q and q’ 


p 


NN 


e 
this point “sees” 


gharke a alone Fig. 3.12. Charge images for a dielectric half-space. 


As for the case of a conducting half-space, the Laplace equation for the electrostatic potential in 
the upper half-space z > 0 (besides the charge point p = 0, z = d) may be satisfied using a single image 
charge q’ at the point with p = 0 and z = — d, but now q’ may differ from (—gq). In addition, in contrast to 
the case analyzed in Sec. 2.9, we should also calculate the field inside the dielectric (at z < 0). This field 
cannot be contributed by the image charge g’, because that would give a potential divergence at its 
location. Thus, in the dielectric-filled half-space we should try to use the real point source only, but with 
a re-normalized charge q” rather than the genuine charge g — see Fig. 12. As a result, we may look for 
the potential distribution in the form 


q q' 

Tra ae es tore 0, 

; lL IL(6? +(2— dy)” (p? +(2+d)’) (3.66) 
va 4 Te for z <0, 

2 2 
(cp? +(z-d)’) 
at this stage of solution, with unknown q’ and qg”’. Plugging this equality into the boundary conditions 
(37) and (56) at z = 0 (with 0/On = 0/0z), we see that they are indeed satisfied (so that Eq. (66) does 


express the solution of the boundary problem), provided that the effective charges q’ and q’’ obey the 
following relations: 


$(p,2) = 
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q-q'=Kq", qt q'=q". (3.67) 
Solving this simple system of linear equations, we get 


g'=-S1q, qh=— 4. (3.68) 
k+1 k+1 
If «> 1, then g’— 0, and q’’ > q — both facts very natural because in this limit (no polarization 
at all) we have to recover the unperturbed field of the initial point charge in both semi-spaces. In the 
opposite limit « — 0 (which, as was discussed above, may describe a conducting half-space), g’ —q 
(repeating the result we have discussed in detail in Sec. 2.9), and q’’ > 0. The last result means that in 
this limit, the electric field E in the dielectric tends to zero — as it should. 


In conclusion of this section, please note that if the permittivity ¢ of a linear dielectric is a 
continuous rather than piecewise function of coordinates, the distribution of the electrostatic potential ¢ 
may be found from Eq. (32) with the electric displacement given by Eq. (46): D = a(r)E = —€(r)V¢. 
However, analytical solutions of the resulting partial differential equation of the second order may be 
found only for rare particular cases; one of them is offered in Sec. 6 for the reader’s exercise. 


3.5. Electric field energy in a dielectric 


In Chapter 1, we have obtained two key results for the electrostatic energy: Eq. (1.55) for a 
charge interaction with an independent (“external”) field, and a similarly structured formula (1.60), but 
with an additional factor 2, for the field induced by the charges under consideration. These relations are 
universal, i.e. valid for dielectrics as well, provided that the charge density includes all charges — 
including those bound into the elementary dipoles. However, for most applications, it is convenient to 
recast them into a form where these bound charges participate not explicitly, but only via the 
macroscopic polarization effects they create. 


If a field is created only by the stand-alone charges under consideration and is proportional to 
(r) (requiring that we deal with linear dielectrics), we can repeat all the argumentation of the beginning 
of Sec. 1.3, and again arrive at Eq. (1.60), provided that ¢is now the macroscopic field’s potential. Now 
we can recast this result in the terms of fields — essentially as this was done in Eqs. (1.62)-(1.64), but 
now making a clear difference between the macroscopic electric field E = —V@ and the electric 
displacement field D, which obeys the macroscopic Maxwell equation (32). Plugging p(r) expressed 
from that equation, into Eq. (1.60), we get 


U=S|(V-D)bd’r. (3.69) 
Using the fact?° that for differentiable functions ¢ and D, 


(V-D)g=V-(¢D)-(V¢g)-D, (3.70) 
we may rewrite Eq. (69) as 


1 %- 2 : 
U= |v gD)a r->|(Vd) Da r. (3.71) 


26 See, e.g., MA Eq. (11.4a). 
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The divergence theorem, applied to the first term on the right-hand side, reduces it to a surface integral 
of gD, (As a reminder, in Eq. (1.63) the integral was of (V@), « @E,,.) If the surface of the volume we 
are considering is sufficiently far, this surface integral vanishes. On the other hand, the gradient in the 
second term of Eq. (71) is just (minus) field E, so it gives 


U= =[E-p @r= 5 [zt) e(r)E(r) d?r = SJE) d’r. (3.72) 


This expression is a natural generalization of Eq. (1.65), and shows that we can, as we did in free space, 
represent the electrostatic energy in a local form:27 


U =[u(r)d’r, with u=SE-D=5E ; (3.73) 


As a sanity check, in the trivial case ¢= & (i.e. «= 1), this result is reduced to Eq. (1.65). 


Of course, Eq. (73) is valid only for linear dielectrics, because our starting point, Eq. (1.60), is 
only valid if ¢ is proportional to . To make our calculation more general, we should intercept the 
calculations of Sec. 1.3 at an earlier stage, at which this proportionality had not yet been used. For 
example, the first of Eqs. (1.56) may be rewritten, in the continuous form, as 


OU = | d(r)dp(r)d?r , (3.74) 


where the symbol 6 means a small variation of the function — e.g., its change in time, sufficiently slow to 
ignore the relativistic and magnetic-field effects. Applying such variation to Eq. (32), and plugging the 
resulting relation do = V: OD into Eq. (74), we get 


dU =|(V-D)¢ dr. (3.75) 


(Note that in contrast to Eq. (69), this expression does not have the front factor 2.) Now repeating the 
same calculations as in the linear case, for the energy density’s variation we get a remarkably simple 
(and general!) expression, 


(3.76) 


where the last expression uses the Cartesian components of the vectors E and D. This is as far as we can 
go for the general dependence D(E). If the dependence is linear and isotropic, as in Eq. (46), then 6D = 
é0K and 


E? 
The integration of this expression over the whole variation, from the field equal to zero to a certain final 


distribution E(r), brings us back to Eq. (73). 


An important role of Eq. (76), in its last form, is to indicate that from the point of view of 
analytical mechanics, the Cartesian coordinates of E may be interpreted as generalized forces, and those 


27 In the Gaussian units, each of the last three expressions should be divided by 47. 


Chapter 3 Page 22 of 28 


Field 
energy in 
a linear 
dielectric 


Energy 
density’s 
variation 


Gibbs 
potential 
energy 


Essential Graduate Physics EM: Classical Electrodynamics 


of D as generalized coordinates of the field’s effect on a unit volume of the dielectric. This allows one, 
in particular, to form the proper Gibbs potential energy’ of a system with an electric field E(r) fixed, at 
every point, by some external source: 


(3.78) 


The essence of this notion is that if the generalized external force (in our case, E) is fixed, the stable 
equilibrium of the system corresponds to the minimum of Ug, rather than of the potential energy U as 
such — in our case, that of the field in our system. 


As the simplest illustration of this important concept, let us consider a very long cylinder (with 
an arbitrary cross-section’s shape), made of a uniform linear dielectric, placed into a uniform external 
electric field parallel to the cylinder’s axis — see Fig. 13. 


Fig. 3.13. A cylindrical dielectric sample 
in a longitudinal external electric field. 


For this simple problem, the equilibrium value of D inside the cylinder may be, of course, readily 
found without any appeal to energies. Indeed, the solution of the Laplace equation inside the cylinder, 
with the boundary condition (37) is evident: E(r) = Ext, and so that Eq. (46) immediately yields D(r) = 
éE.x;. One may wonder why does the minimum of the potential energy U, given by Eq. (73) in its last 
form, : 

oe ; (3.79) 
Vee 
correspond to a different (zero) value of D, but let us recall that Eq. (73) was derived for the case when 
the electric field is created by the stand-alone charges in the system under consideration. If it is created 
by external sources, we have to use the Gibbs potential energy (78) instead. For our current uniform 
case, this energy per unit volume of the cylinder is 


U 2 3 ( —D? 
Uc _U _ppy_P E-D=)>|—-£,D,|, (3.80) 
VoV 2e alee 


and its minimum as a function of every Cartesian component of D corresponds to the correct value of 
the displacement: D;= é&;, i.e. to D = cE = cKext. So, the systems’ equilibrium indeed corresponds to the 
minimum of the Gibbs potential energy (78) rather than of the energy (73). 


28 See, e.g., CM Sec. 1.4, in particular Eq. (1.41). Note that as Eq. (78) clearly illustrates, once again, that the 
difference between the potential energies Ug and U, usually discussed in courses of thermodynamics and 
statistical physics as the difference between the Gibbs and Helmholtz free energies (see, e.g., SM 1.4), is more 
general than the effects of random thermal motion addressed by these disciplines. 
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Now note that Eq. (80), at this equilibrium point (only!), may be rewritten as 


2 2 
Ue op jee” pee. (3.81) 
V V 26 € 2é 


i.e. formally coincides with Eq. (79), besides the (perhaps, somewhat counter-intuitive) opposite sign. A 
similar but more general relation (not limited to linear dielectrics and uniform fields) may be obtained 
by taking the variation of the wg expressed by Eq. (78), and then using Eq. (76): 


Ou, = du—6(E-D)=E-oD-(S0E-D+E-SD)=-D-. (3.82) 
In order to see how this expressions works, let us plug D from Eq. (33): 


Ey 
Ou =—(€,E+ P).0R =-5 ar —P-E. (3.83) 


So far, this relation is general. In the particular case when the polarization P is field-independent, 
we may integrate Eq. (83) over the full electric field’s variation, say from 0 to some finite value E, 
getting 


2 
EE 
ug =-—— 


—P-E. (3.84) 
Again, the Gibbs energy is relevant only if E is dominated by an external field E.,; independent of the 
orientation of P. If, in addition, P(r) # 0 only in some finite volume V, we may integrate Eq. (84) over 
that volume, getting 


U,=-p:E 


ext + const, with p= [P@da'r (3.85) 
V 

where the “const” means the terms independent of p. In this expression, we may readily recognize Eq. 

(15a) for an electric dipole p of a fixed magnitude, which was obtained in Sec. | in a different way. This 

comparison illustrates again that Ug is nothing mysterious; it is just the relevant part of the potential 

energy of the system in a fixed external field, including the energy of its interaction with the field. 


Finally, in the other important case of a linear dielectric, when according to Eqs. (45) and (47), P 
=(é€- &)E, the similar integration of the general Eq. (83) over the field yields the additional factor : 
1 


U. = [P-E 
V 


as d*r+const . (3.86) 


ext 


This expression may be very convenient for analyses of the forces exerted by electric fields on linear 
dielectric media — see, for, example, a few exercises on this topic, offered at the end of this chapter. 


3.6. Exercise problems 


3.1. Prove Eqs. (3)-(4), starting from Eqs. (1.38) and (3.2). 


3.2. A plane thin ring of radius R is charged with a constant linear density 2. Calculate the exact 
electrostatic potential distribution along the symmetry axis of the ring, and prove that at large distances, 
r >> R, the three leading terms of its multipole expansion are indeed correctly described by Eqs. (3)-(4). 
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3.3. In suitable reference frames, calculate the dipole and quadrupole moments of the following 
systems (see the figures below): 


(i) four point charges of the same magnitude, but alternating signs, placed in the corners of a 
square; 

(ii) a similar system, but with a pair charge sign alternation; and 

(iii) a point charge in the center of a thin ring carrying a similar but opposite charge, uniformly 
distributed along its circumference. 


Dug »« a li) tg go tgSté#WG) 
ie o) aa o) 
a! a a! - 
o54 6 es 6 =o 
+q a —q —| a —q 


3.4. Calculate the dipole and quadrupole moments of a thin spherical shell of radius R, carrying 
an electric charge with areal density o = opcos@. Discuss the relation between the results and the 
solution of Problem 2.28. 


3.5. For a regular cubic lattice of similarly oriented identical dipoles, calculate the electric field it 
creates at the location of each dipole. 


3.6. Without carrying out an exact calculation, can you predict the spatial dependence of the 
interaction between various electric multipoles, including point charges (in this context, frequently 
called electric monopoles), dipoles, and quadrupoles? Based on these predictions, what is the functional 
dependence of the interaction between diatomic molecules such as Ho, No, Oo, etc., on the distance 
between them, if the distance is much larger than the molecular size? 


3.7. Two similar electric dipoles, of fixed magnitude p, located at a fixed distance r from each 
other, are free to change their directions. What stable equilibrium position(s) they may take as a result of 
their electrostatic interaction? 


3.8. An electric dipole is located above an infinite, grounded id 
conducting plane (see the figure on the right). Calculate: yA p 


(i) the distribution of the induced charge in the conductor, 
(ii) the dipole-to-plane interaction energy, and 
(ii) the force and the torque exerted on the dipole. 


3.9. Calculate the net charge Q induced in a grounded conducting sphere 
of radius R by a dipole p located at point r outside the sphere — see the figure on 
the right. 
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3.10. Use two different approaches to calculate the energy of interaction between a grounded 
conductor and an electric dipole p placed in the center of a spherical cavity of radius R, carved in the 
conductor. 


3.11. A plane separating two parts of otherwise free space is densely and uniformly (with a 
constant areal density n) covered with electric dipoles, with similar moments p oriented normally to the 
plane. 


(i) Use two different approaches to calculate the electrostatic potential at distances d >> 1/n'? on 


both sides of the plane. 

(ii) Give a physical interpretation of your result. 

(111) Use the result to calculate the potential distribution created in space by a spherical surface of 
radius R, densely and uniformly covered with radially-oriented dipoles. 


3.12. Prove Eq. (24). 


Hint: You may like to use the basic Eq. (1.9) to spell out the left-hand side of Eq. (1.24), change 
the order of integration over r and r’, and then contemplate the physical sense of the inner integral. 


3.13. A sphere of radius R is made of a material with a uniform spontaneous polarization Po. 
Calculate the electric field everywhere in space — both inside and outside the sphere, and compare the 
result for the internal field with Eq. (24). 


3.14. Calculate the electric field at the center of a cube made of a material with the uniform 


spontaneous polarization Po of arbitrary orientation. 


3.15. Derive the Clausius-Mossotti formula (52), using Eq. (24) for an approximate evaluation of 
the field created by an elementary atomic/molecular dipole. 


3.16. Stand-alone charge Q is distributed, in some way, in the volume of a body made of a 
uniform linear dielectric with a dielectric constant «. Calculate the total polarization charge Q.¢ residing 
on the surface of the body, provided that it is surrounded by free space. 


3.17. In two separate experiments, a thin plane sheet of a linear dielectric with « = const is 
placed into a uniform external electric field Eo: 


(1) with the sheet’s surfaces parallel to the electric field, and 
(11) with the surfaces normal to the field. 


For each case, find the electric field E, the electric displacement D, and the polarization P inside the 
dielectric (sufficiently far from the sheet’s edges). 


3.18. A fixed dipole p is placed in the center of a spherical cavity of radius R, carved inside a 
uniform, linear dielectric. Calculate the electric field distribution everywhere in the system (both at r < 
Rand atr> R). 

Hint: You may start with the assumption that the field at r > R has a distribution typical for a 
dipole (but be ready for surprises :-). 
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3.19. A spherical capacitor (see the figure on the right) is filled with a 
linear dielectric whose permittivity ¢ depends on the spherical angles @ and g, 
but not on the distance r from the system’s center. Derive an explicit 
expression for its capacitance C. 


bod 


3.20. A spherical capacitor similar to that considered in the previous problem is now filled with a 
linear dielectric whose permittivity depends only on the distance from the center. Obtain a closed 
expression for its capacitance, and spell it out for the particular case «(r) = e(a)(r/a)". 


—+ i —> 
3.21. A uniform electric field Epo has been created (by distant external E, 
sources) inside a uniform linear dielectric. Find the change of the electric R 
field, created by carving out a cavity in the shape of a round cylinder of 
radius R, with its axis normal to the external field — see the figure on the right. 
——> ——_> 


3.22. Similar small spherical particles, made of a linear dielectric, are dispersed in free space 
with a low concentration n << 1/R’, where R is the particle's radius. Calculate the average dielectric 
constant of such a medium. Compare the result with the apparent but wrong answer 


K—-1=(«-1)nV, (WRONG!) 


(where « is the dielectric constant of the particle's material and V = (47/3)R? is its volume), and explain 
the origin of the difference. 


3.23. A straight thin filament, uniformly charged with linear density /, is stretched parallel to the 
plane separating two uniform linear dielectrics, at a distance d from it. Calculate the electric potential’s 
distribution everywhere in the system. 


3.24. A point charge g is located at a distance d> R from the center of a sphere of radius R, made 
of a uniform linear dielectric with permittivity «. 


(1) Calculate the electrostatic potential’s distribution in all space for arbitrary ratio d/R. 
(ii) For large d/R, use two different approaches to calculate the interaction force and the energy 
of interaction between the sphere and the charge, in the first nonzero approximation in R/d << 1. 


3.25. Calculate the spatial distribution of the electrostatic potential induced by 
a point charge q placed at distance d from a very wide parallel plate, of thickness D, 
made of a uniform linear dielectric — see the figure on the right. 
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3.26. Discuss the physical nature of Eq. (76). Apply your conclusions to a material with a fixed 
(field-independent) polarization Po, and calculate the electric field’s energy of a uniformly polarized 
sphere (see Problem 13). 


3.27. Use Eqs. (73) and (82) to calculate the force of attraction of plane capacitor’s plates (per 
unit area), for two cases: 


(1) the capacitor is charged to voltage V, and then disconnected from the battery,?? and 
(11) the capacitor remains connected to the battery. 


3.28. A slab made of a linear dielectric is partly inserted into ESS 
a plane capacitor — see the figure on the right. Assuming the i d 
simplest (cylindrical) geometry of the system, calculate the force 
exerted by the field on the slab, for the same two cases as in the SSS. .SS...._F. J 
a> <> 
previous problem x a-Xx 


3.29. For each of the two capacitors shown in Fig. 10, calculate the electric force exerted on the 
interface between two different dielectrics, in terms of the fields in the system. 


3.30. One half of a conducting sphere of radius R, carrying electric 
charge Q, is submerged into a half-space filled with a linear dielectric with 
permittivity ¢ — see the figure on the right. Calculate the electric force 
exerted on the sphere. 


€o 


29 “Battery” is a common if misleading term for what is usually a single galvanic element. (The last term stems 
from the name of Luigi Galvani, a pioneer of electric current studies. Another term derived from his name is the 
galvanic connection, meaning a direct connection of two conductors, enabling a dc current flow — see the next 
chapter.) The term “battery” had to be, in all fairness, reserved for the connection of several galvanic elements in 
series — as was pioneered in 1800 by L. Galvani’s friend Alexander Volta. 
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Chapter 4. DC Currents 


The goal of this chapter is to discuss the distribution of stationary (“dc”) currents in conducting samples 
and their “global” characteristics such as resistance. In the most important case of linear (“Ohmic’’) 
conductivity, the current distribution is governed by the same Laplace and Poisson equations whose 
solution methods were discussed in detail in the previous chapters. Because of that, we can piggyback 
on most approaches discussed earlier, enabling me to keep this chapter rather brief. 


4.1. Continuity equation and the Kirchhoff laws 


Until this point, our discussion of conductors has been limited to the cases when they are 
separated with insulators (meaning either the free space or some dielectric media), preventing any 
continuous motion of charges from one conductor to another, even if there is a non-zero voltage (and 
hence electric field) between them — see Fig. 1a. 


(a) 
E . i +0 


Fig. 4.1. Two oppositely charged conductors: (a) in the electrostatic situation, (b) at the charge 
relaxation through an additional narrow conductor (“wire’’), and (c) in a system sustaining a de current J. 


(b) (c) 


dc current 
source 


; = const 


Now let us connect the two conductors with a wire — a thin, elongated conductor (Fig. 1b). Then 
the electric field causes the motion of charge carriers in the wire, from the conductor with a higher 
electrostatic potential toward that with lower potential, until the potentials equilibrate. Such a process is 
called charge relaxation. The main equation governing this process may be obtained from the 
fundamental experimental fact (already mentioned in Sec. 1.1) that electric charges cannot appear or 
disappear — though opposite charges may recombine with the conservation of the net charge. As a result, 
the charge Q in a conductor may change only due to the electric current I through the wire: 

dQ 


a I(t); (4.1) 


this relation may be understood as the definition of the current.! 


' Just as a (hopefully, unnecessary :-) reminder, in the SI units the current is measured in amperes (A). In legal 
metrology, the ampere (rather than the coulomb, which is defined as 1C = 1A x 1s) is a primary unit. (Its formal 
definition will be discussed in the next chapter.) In the Gaussian units, Eq. (1) remains the same, so that the 
current’s unit is the statcoulomb per second — the so-called statampere. 
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Let us express Eq. (1) in a differential form, introducing the notion of the current density j(r). 
This vector may be defined via the following relation for the elementary current d/ crossing an 
elementary area dA (Fig. 2): 
dI = jdAcos@ =(jcos0@)dA = j,dA, (4.2) 


where @ is the angle between the direction normal to the surface and the charge carrier motion direction, 
which is taken for the direction of the vector j. 


dA cos 0 dA 


10 j 


Fig. 4.2. The current density vector j. 


With that definition, Eq. (1) may be rewritten as 
d 
—|pd’r=-j,d’r, 4.3 
7 J p pin (4.3) 


where V is an arbitrary but stationary volume limited by the closed surface S. Applying to this volume 
the same divergence theorem as was repeatedly used in previous chapters, we get 


j[2av-ifer=o. (4.4) 
“| Ot 
Since the volume V is arbitrary, this equation may be true only if 
Py. j=0. o> Gon 


Ot 
This is the fundamental continuity equation — which is true even for time-dependent phenomena.” 


The charge relaxation, illustrated by Fig. 1b, is of course a dynamic, time-dependent process. 
However, electric currents may also exist in stationary situations, when a certain current source, for 
example a battery, drives the current against the electric field, and thus replenishes the conductor 
charges and sustains currents at a certain time-independent level — see Fig. 1c. (This process requires a 
persistent replenishment of the electrostatic energy of the system from either a source or a large storage 
of energy of a different kind — say, the chemical energy of the battery.) Let us discuss the laws 
governing the distribution of such dc currents. In this case (6/0t = 0), Eq. (5) reduces to a very simple 
equation 

V-j=0. (4.6) 


This relation acquires an even simpler form in the particular but important case of dc electric 
circuits (Fig. 3) — the systems that may be fairly represented as direct (“galvanic”) connections of 
components of two types: 


2 Similar differential relations are valid for the density of any conserved quantity, for example for mass in 
classical dynamics (see, e.g., CM Sec. 8.3), and for the probability, as it is defined in statistical physics (SM Sec. 
5.6) and in quantum mechanics (QM Sec. 1.4). 
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(1) relatively-small-size (Jumped) circuit elements, meaning either a passive resistor, or a current 
source, etc. — generally, any “black box” with two or more terminals, and 


(11) perfectly conducting wires, with a negligible drop of the electrostatic potential along them, 
that are galvanically connected at certain points called nodes (or “junctions’’). 


“circuit 
element” 


Fig. 4.3. A typical system obeying Kirchhoff 
laws. 


In the standard circuit theory, the electric charges of the nodes are considered negligible,? and we 
may integrate Eq. (6) over the closed surface drawn around any node to get a simple equality 


S20, (4.7a) 


J 


where the summation is over all the wires (numbered with index /) connected in the node. On the other 
hand, according to its definition (2.25), the voltage V; across each circuit element may be represented as 
the difference of the electrostatic potentials of the adjacent nodes, Vi = ¢& — @1. Summing such 
differences around any closed loop of the circuit (Fig. 3), we get all terms canceled, so that 


> 20: (4.7b) 


These relations are called, respectively, the / and 2" Kirchhoff laws4 — or sometimes the node 
rule (7a) and the loop rule (7b). They may seem elementary, and their genuine power is in the 
mathematical fact that any set of Eqs. (7) covering every node and every circuit element of the system at 
least once, gives a system of equations sufficient for the calculation of all currents and voltages in it — 
provided that the relation between the current and voltage is known for each circuit element. 


It is almost evident that in the absence of current sources, the system of equations (7) has only 
the trivial solution: J; = 0, V; = 0 — with the exotic exception of superconductivity, to be discussed in 
Sec. 6.3. The current sources that allow non-zero current flows may be described by their electromotive 
forces (emf) %, having the dimensionality of voltage, which have to be taken into account in the 
corresponding terms V;, of the sum (7b). Let me hope that the reader has some experience of using Eqs. 
(7) for analyses of simple circuits — say, consisting of several resistors and batteries, so that I can save 
our time by skipping their discussion. Still, due to their practical importance, I would recommend the 
reader to carry out a self-test by solving a couple of problems offered at the beginning of Sec. 6. 


3 In many cases, the charge accumulation/relaxation may be described without an explicit violation of Eq. (7a), 
just by adding other circuit elements, Jumped capacitors (see Fig. 2.5 and its discussion), to the circuit under 
analysis. The resulting circuit may be used to describe not only the transient processes but also periodic ac 
currents. However, it is convenient for me to postpone the discussion of such ac circuits until Chapter 6, where 
one more circuit element type, Jumped inductances, will be introduced. 

4 Named after Gustav Kirchhoff (1824-1887) — who also suggested the differential form (8) of the Ohm law. 
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4.2. The Ohm law 


As was mentioned above, the relations spelled out in Sec. 1 are sufficient for forming a closed 
system of equations for finding electric current and field in a system only if they are complemented with 
some constitutive relations between the scalars J and V in each lumped circuit element, or alternatively 
between the macroscopic (atomic—scale—averaged) vectors j and E at each point of the material of such 
an element. The simplest of such relations is the famous Ohm law whose differential (or “local’’) form is 


where o is a constant called the Ohmic conductivity (or just the “conductivity” for short).5 Though the 
Ohm law (discovered, in its simpler form, by Georg Simon Ohm in 1827) is one of constitutive rather 
than fundamental relations, and is approximate for any conducting medium, we can argue that if: 


(i) the medium carries no current at E = 0 (mind superconductors!), 

(11) the medium is isotropic or virtually isotropic (a notable exception: some organic conductors), 

(iii) the mean free path I of the current carriers (the notion to be discussed in detail in SM Ch. 6) 
in this medium is much smaller than the characteristic scale a of the spatial variations of j and E, 


then the law may be viewed as the leading, linear term of the Taylor expansion of the local relation j(E), 
and thus is general for relatively low fields. 


Table 1 gives approximate experimental values of o for some representative (and/or practically 
important) materials. Note that the range of these values is very broad, even without going to such 
extremes as very pure metallic crystals at very low temperatures, where o may reach ~10'? S/m. 


Table 4.1. Ohmic de conductivities for some materials at 20°C. 


Material o (S/m) 
Teflon (PTFE, [C,F,],) 107-104 
Silicon dioxide 10°'°-10° 
Various glasses 107°-10"" 
Deionized water ~10° 
Seawater 5 
Silicon n-doped to 10'°cm® 2.5x10° 
Silicon n-doped to 10'°cm® 1.6x10* 
Silicon p-doped to 10'°cm® 1.1x10* 
Nichrome (alloy 80% Ni + 20% Cr) 0.9x10° 
Aluminum 3.8x107 
Copper 6.0x 10’ 
Zinc crystal along a-axis 1.65x10! 
Zinc crystal along c-axis 1.72x10' 


5In SI units, the conductivity is measured in S/m, where one siemens (S) is the reciprocal of the ohm: 1S = (1Q)' 
= 1A/1V. The constant reciprocal to conductivity, 1/o, is called resistivity and is commonly denoted by the letter 
p. I will, however, try to avoid using this notion, because in these notes this letter is already overused. 
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In order to get a better feeling of what do these values mean, let us consider a very simple system 
(Fig. 4): a plane capacitor of area A >> d’, filled with a material that has not only a dielectric constant x, 
but also some Ohmic conductivity o, with much more conductive electrodes. 


Fig. 4.4. A “leaky” plane capacitor. 


Assuming that these properties are compatible with each other,© we may assume that the 
distribution of the electric potential (not too close to the capacitor’s edges) still obeys Eq. (2.39), so that 
the electric field is normal to the electrode surfaces and uniform, with E = V/d. Then, according to Eq. 
(6), the current density is also uniform, 7 = oF = oV/d. From here, the total current between the plates is 


I= jA=oEA= od. (4.9) 


On the other hand, from Eqs. (2.26) and (3.45), the instantaneous value of the total charge of the top 
electrode is O = CV = (x&A/d)V. Plugging these relations into Eq. (1), we see that the speed of charge 
(and voltage) relaxation is independent of the geometric parameters A and d of the capacitor: 


So, ihe =o, (4.10) 


dt a oC oOo 
so that the relaxation time constant tT, may be used to characterize the gap-filling material as such. 


As we already know (see Table 3.1), for most practical materials the dielectric constant « is 
within one order of magnitude from 10, so that the numerator in the second of Eqs. (10) is of the order 
of 10°'°(SI units). As a result, according to Table 1, the charge relaxation time ranges from ~10'*s (more 
than a million years!) for the best insulators like Teflon (polytetrafluoroethylene, PTFE),’ to ~10°''s for 
the least resistive metals. What is the physics behind such a huge range of o, and why, for some 
materials, Table 1 gives them with such a large uncertainty? As in Chapters 2 and 3, in this course I 
have time only for a brief, admittedly superficial discussion of these issues.’ 


If the charge carriers move almost as classical particles (e.g., in plasmas or non-degenerate 
semiconductors), a very reasonable description of the conductivity is given by the famous Drude 
formula.’ In his picture, due to a weak electric field, the charge carriers are accelerated in its direction 
(on top of their random motion in all directions, with the average velocity vector equal zero): 

dv q 


-4f, 4.11 
dt m ( ) 


and as a result, their velocity acquires the average value 


6 As will be discussed in Chapter 6, this is true only if ois not too high. 

7 This polymer is broadly used in engineering and physical experiment, due to its many remarkable properties. 
8 A more detailed discussion of this issue may be found in SM Chapter 6. 

9 It was suggested by Paul Drude in 1900. 
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oe ie, (4.12) 
dt m 


where the phenomenological parameter t= //2v (not to be confused with 7!) may be understood as half 
the average time between carrier scattering events. From here, the current density:!° 


(4.13a) 


(Notice the independence of o of the charge sign.) Another form of the same result, more popular in the 
physics of semiconductors, is 


o=q nl, with “= Z (4.13b) 
m 


where the parameter yy, defined by the relation v = WE, is called the charge carrier mobility. 


Most good conductors (e.g., metals) are essentially degenerate Fermi gases (or liquids), in which 
the average thermal energy of a particle, Ag7 is much lower than the Fermi energy é. In this case, a 
quantum theory is needed for the calculation of o. Such theory was developed by the quantum physics’ 
godfather A. Sommerfeld in 1927 (and is sometimes called the Drude-Sommerfeld model). I have no 
time to discuss it in this course,!! and here will only notice that for a nearly-ideal, isotropic Fermi gas 
the result is reduced to Eq. (13), with a certain effective value of 7, so it may be used for estimates of o, 
with due respect to the quantum theory of scattering. In a typical metal, n is very high (~10”* cm’) and 
is fixed by the atomic structure, so the sample quality may only affect o via the scattering time t. 


At room temperature, the scattering of electrons by thermally-excited lattice vibrations 
(phonons) dominates, so that tr and o are high but finite, and do not change much from one sample to 
another. (Hence the relatively accurate values given for metals in Table 1.) On the other hand, at T— 0, 
quantum mechanics says a perfect crystal should not exhibit scattering at all, and its conductivity should 
be infinite. In practice, this is never true (for one, due to electron scattering from imperfect boundaries 
of finite-size samples), and the effective conductivity ois infinite (or practically infinite, at least above 
the largest measurable values ~10”° S/m) only in superconductors. !2 


On the other hand, the conductivity of quasi-insulators (including deionized water) and 
semiconductors depends mostly on the carrier density n, which is much lower than in metals. From the 
point of view of quantum mechanics, this happens because the ground-state wavefunctions of charge 
carriers are localized within an atom (or molecule), and their energies are separated from those of 
excited states, with space-extended wavefunctions, by a large energy gap — often called the bandgap. 
For example, in SiO2 the bandgap approaches 9 eV, equivalent to ~4,000 K. This is why even at room 
temperatures the density of thermally-excited free charge carriers in good insulators is negligible. In 
these materials, n is determined by impurities and vacancies, and may depend on a particular chemical 
synthesis or other fabrication technology, rather than on the fundamental properties of the material. (On 
the contrary, the carrier mobility 42 in these materials is almost technology-independent.) 


10 Note that j in Eq. (8) is defined as an already macroscopic variable, averaged over inter-particle distances, so 
that no additional average sign is necessary in the first of Eqs. (13a). 

'l For such a discussion see, e.g., SM Sec. 6.3. 

12 The electrodynamic properties of superconductors are so interesting (and fundamentally important) that I will 
discuss them in more detail in Chapter 6. 
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The practical importance of the fabrication technology may be illustrated by the following 
example. In the cells of the so-called floating-gate memories, in particular the flash memories, which 
currently dominate the nonvolatile digital memory technology, data bits are stored as small electric 
charges (O ~ 10°'° C ~ 10° e) of highly doped silicon islands (so-called floating gates) separated from 
the rest of the integrated circuit with ~10-nm-thick layers of silicon dioxide, SiOz. Such layers are 
fabricated by high-temperature oxidation of virtually perfect silicon crystals. The conductivity of the 
resulting high-quality (though amorphous) material is so low, o~ 10°’? S/m, that the relaxation time 1, 
defined by Eq. (10), is well above 10 years — the industrial standard for data retention in nonvolatile 
memories. To appreciate how good this technology is, the cited value should be compared with the 
typical conductivity o~ 10°'® S/m of the usual, bulk SiO ceramics.!3 


To conclude this chapter, let me note that the Ohm law, for all its importance, is not a universal 
law of nature. As a reminder of this fact, in Sec. 5 below I describe two very simple systems (leaving 
their analysis for the reader’s exercise) whose /-V relation is nonlinear even for very small currents. 


4.3. Boundary problems 


For an Ohmic conducting medium, we may combine Eqs. (6) and (8) to get the following 
differential equation 
V-(o V¢)=0. (4.14) 


For a uniform conductor (o = const), Eq. (14) is reduced to the Laplace equation for the (macroscopic) 
electrostatic potential ¢. As we already know from Chapters 2 and 3, its solution depends on the 
boundary conditions. These conditions, in turn, depend on the interface type. 


(i) Conductor-conductor interface. Applying the continuity equation (6) to a Gauss-type pillbox 
at the interface of two different conductors (Fig. 5), we get 


Gin) = (in)25 (4.15) 


so that if the Ohm law (8) is valid inside each medium, then 


oO ahs obs (4.16) 


' On > On 


Fig. 4.5. DC current’s “refraction” at the interface between 
two different conductors. 


13 This course is not an appropriate platform to discuss details of the floating-gate memory technology. However, 
I think that every educated physicist should know its basics, because such memories are presently the driver of all 
semiconductor integrated circuit technology development, and hence of the whole information technology 
progress. Perhaps the best available general book on this topic is still the relatively old review collection by J. 
Brewer and M. Gill (eds.), Nonvolatile Memory Technologies with Emphasis on Flash, IEEE Press, 2008. 
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Also, since the electric field should be finite, its potential ¢ has to be continuous across the 
interface — the condition that may also be written as 
2h, Oe ne 
Ot OF 
Both these conditions (and hence the solutions of the boundary problems using them) are similar to 
those for the interface between two dielectrics — cf. Eqs. (3.46)-(3.47). Note that using the Ohm law, Eq. 
(17) may be rewritten as 


a Caen ae (4.18) 


0; oP) 
Comparing it with Eq. (15) we see that, generally, the current density’s magnitude changes at the 


interface: 7; #2. It is also curious that if o; # op, the current line slope changes at the interface (Fig. 5), 
qualitatively similar to the refraction of light rays in optics — see Chapter 7. 


(11) Conductor-electrode interface. An electrode is defined as a body made of a “perfect 
conductor”, i.e. of a medium with o > oo, Then, at a fixed current density at the interface, the electric 
field in the electrode tends to zero, and hence it may be described by the equality 


go =, =const, (4.19) 


where constants g may be different for different electrodes (numbered with index /). Note that with 
such boundary conditions, the Laplace boundary problem becomes exactly the same as in electrostatics 
— see Eq. (2.35) — and hence we can use the methods (and some solutions :-) discussed in Chapter 2 for 
finding the de current distribution. 


(iii) Conductor-insulator interface. For the description of a good insulator, we can use the 
equality o = 0, so that Eq. (16) yields the following boundary condition, 


ey, (4.20) 

On 
for the potential derivative inside the conductor. From the Ohm law (8) in the form j = —oV4@, we see 
that this is just the very natural requirement for the dc current not to flow into an insulator. Now note 
that this condition makes the Laplace problem inside the conductor completely well-defined, and 
independent of the potential distribution in the adjacent insulator. On the contrary, due to the continuity 
of the electrostatic potential at the border, its distribution inside the surrounding insulator has to follow 
that inside the conductor. 


Let us discuss this conceptual issue on the following (apparently, trivial) example: dc current in 
a uniform wire of length / and a cross-section of area A. The reader certainly knows the answer: 


(4.21) 


where the constant R is called the wire’s resistance.'4 


14 The first of Eqs. (21) is essentially the (historically, initial) integral form of the Ohm law, and is valid not only 
for a uniform wire but also for Ohmic conductors of any geometry in that J and V may be clearly defined. 
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However, let us derive this result formally from our theoretical framework. For the simple 
geometry shown in Fig. 6a, this is easy to do. Here the potential evidently has a linear 1D distribution 


@ = const = (4.22) 


both in the conductor and the surrounding free space, with both boundary conditions (16) and (17) 
satisfied at the conductor-insulator interfaces, and the condition (20) satisfied at the conductor-electrode 
interfaces. As a result, the electric field is constant and has only one Cartesian component: E, = V/I, so 
that inside the conductor 

i A, (4.23) 


giving us the well-known Eq. (21). 


Fig. 4.6. (a) An elementary 
problem and (b) a (slightly) 
less obvious problem of the 
field distribution at dc 
current flow (schematically). 


However, what about the geometry shown in Fig. 6b? In this case, the field distribution in the 
free space around the conductor is dramatically different, but according to the boundary problem 
defined by Eqs. (14) and (20), inside the conductor, the solution is exactly the same as it was in the 
former case. Now, the Laplace equation in the surrounding insulator has to be solved with the boundary 
values of the electrostatic potential, “dictated” by the distribution of the current (and hence potential) in 
the conductor. Note that as the result, the electric field lines are generally not normal to the conductor’s 
surface, because the surface is not equipotential — see Eq. (22) again. 


Let us solve a problem in that this conduction hierarchy may be followed analytically to the very 
end. Consider an empty spherical cavity carved in a conductor with an initially uniform current flow 
with a constant density jo = no (Fig. 7a). Following the hierarchy, we have to solve the boundary 
problem in the conducting part of the system, i.e. outside the sphere (at r => R), first. Since the problem is 
evidently axially symmetric, we already know the general solution of the Laplace equation — see Eq. 
(2.172). Moreover, we know that in order to match the uniform field distribution at r > , all 
coefficients a; but one (a; = —Eo = —o/0) have to be zero, and that the boundary conditions at r = R will 
give zero solutions for all coefficients b; but one (1), so that 


j b 
p=—treosd + cosd, forr>R. (4.24) 


In order to find the remaining coefficient b;, we have to use the boundary condition (20) at r= R: 
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p= [- 2 2 a }50s8 =0. (4.25) 


This gives b; =—joR*/20, so that, finally, 
3 


BGO iif, ie =) bos. “fer FOR. (4.26) 
oO 2r 


(Note that this potential distribution corresponds to the dipole moment p = —EoR*/2. It is straightforward 
to check that if the spherical cavity was cut in a dielectric, the potential distribution outside it would be 
similar, with p = —EoR?(« — 1)/(x + 2). In the limit « > ©, these two results coincide, despite the rather 
different type of the problem: in the dielectric case, there is no current at all.) 


(a) (b) 


Fig. 4.7. A spherical cavity carved in a uniform conductor: (a) the problem’s geometry, and (b) the 
equipotential surfaces as given by Eqs. (26) and (28). 


Now, as the second step in the conductivity hierarchy, we may find the electrostatic potential 
distribution 47,0) in the insulator, in this particular case inside the empty cavity (at r < R). It should also 
satisfy the Laplace equation with the boundary values at 7 = R, “dictated” by the distribution (26): 


6(R,0) = -2 2% Roos. (4.27) 
20 


We could again solve this problem by the formal variable separation (keeping in the general solution 
(2.172) only the term proportional to a, which does not diverge at r > 0), but if we notice that the 
boundary condition (27) depends on just one Cartesian coordinate, z = Rcos@, the solution may be just 
guessed: 


P(r, 0) = 3 Jo, = 2 Jo e580 atr<R. (4.28) 
20 20 


Indeed, it evidently satisfies the Laplace equation and the boundary condition (27), and corresponds to a 
constant electric field parallel to the vector jo, and equal to 3/9/20 — see Fig. 7b. Again, the cavity 
surface it not equipotential, and the electric field lines at r < R are not normal to it at almost all points. 


More generally, the conductivity hierarchy says that static electrical fields and charges outside 
conductors (e.g., electric wires) do not affect currents flowing in the wires, and it is physically very clear 
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why. For example, if a charge in the free space is slowly moved close to a wire, it (in accordance with 
the linear superposition principle) only induces an additional surface charge (see Sec. 2.1) that screens 
the external charge’s field, without participating in the current flow inside the conductor. 


Besides this conceptual issue, the two examples given above may be considered as applications 
of the first two methods discussed in Chapter 2 — the orthogonal coordinates (Fig. 6) and the variable 
separation (Fig. 7) — to de current distribution problems. As the reader may recall, in that chapter we 
also discussed the method of charge images. It turns out that its analog may be also used for the solution 
of some de conductivity problems. Indeed, let us consider a spherically-symmetric potential distribution 
of the electrostatic potential, similar to that given by the basic Eq. (1.35): 


ge (4.29) 
r 
As we know from Chapter 1, this is a particular solution of the 3D Laplace equation at all points but r = 
0. In free space, this distribution would correspond to a point charge g = 4zec; but what about a 


uniform Ohmic conductor? Calculating the corresponding electric field and current density, 


E=-V¢=<r, j=o=o-r, (4.30) 
r r 
we see that the total current flowing from the origin through a sphere of an arbitrary radius r does not 
depend on the radius: 


I= Aj=4nr’ j =420¢c. (4.31) 
Plugging the resulting coefficient c into Eq. (29), we get 
I 


ae (4.32) 


Hence the Coulomb-type distribution of the electric potential in a conductor is possible (at least 
at some distance from the singular point r = 0), and describes the dc current / flowing out of a small-size 
electrode — or into such an electrode if the coefficient c is negative. Such current injection may be 
readily implemented experimentally; think for example about an insulated wire with a small bare end, 
inserted into a poorly conducting soil — an important method in geophysical research. Such point 
injection is even simpler in 2D situations — think about a wire attached, within a small spot, to a thin 
resistive layer, such as the thin films used for wiring in microelectronics.!> 


Now let the current injection point r’ be close to a plane interface between the conductor and an 
insulator (Fig. 8). In this case, besides the Laplace equation, we should satisfy the boundary condition, 


ot 
on 


at the interface. It is clear that this can be done by replacing the insulator with an imaginary similar 
conductor with an additional current injection point, at the mirror image point r”. Note, however, that in 
contrast to charge images, the sign of the imaginary current has to be similar, not opposite, to the initial 
one, so that the total electrostatic potential inside the conducting semi-space is 


Ie OE es =0, (4.33) 


15 Note that in such layers, the current distribution near the injection point is different, j oc 1/r rather than 1/7’. 
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a- 2 |), (4.34) 


4no\|r-r'| |r-r” 


(The image current’s sign would be opposite at the interface between a conductor with a moderate 
conductivity and a perfect conductor (“electrode”’), whose potential should be virtually constant.) 


r’ a : 


Fig. 4.8. Applying the method of images 
for the current injection analysis. 


This result may be readily used, for example, to calculate the current density at a plane surface of 
a uniform conductor, as a function of distance po from point 0 — the surface’s point closest to the current 
injection site — see Fig. 8. At such surface, Eq. (34) yields 


wl I , (4.35) 
na (0? ege\ 


so that the current density is: 


0 ri 
Og eS a (4.36) 
Op 2x (0° +a?) 
Deviations from Eqs. (35) and (36) may be used to find and characterize conductance 
inhomogeneities, say, those due to mineral deposits in the Earth’s crust.!¢ 


J, = OE 


So, the methods used in electrostatics to calculate the potential distribution in linear dielectrics 
may be also used to find such distributions in Ohmic conductors. Moreover, some of these methods are 
more valuable in this field. For example, in electrostatics the effective methods of solution of the 2D 
Laplace equation, discussed in Secs. 2.3-2.6, could be only applied to cylindrical geometries. At Ohmic 
conduction, this equation is also valid in some 3D cases. A practically important example is the current 
flow in thin resistive layers where, due to the conductivity hierarchy principle, the 3D-distributed field 
outside a layer, induced by the 2D-distributed current in it, does not affect the flow and in many cases is 
not important. A few problems of this kind, formulated in Sec. 5, are left for the reader’s exercise. 


4.4. Energy dissipation 


Let me conclude this brief chapter with an ultra-short discussion of energy dissipation in 
conductors. In contrast to the electrostatic situations in insulators (vacuum or dielectrics), at dc 


16 The current injection may be also produced, due to electrochemical reactions, by an ore mass itself, so that one 
need only measure (and correctly interpret :-) the resulting potential distribution — the so-called se/f-potential 
method — see, e.g., Sec. 6.1 in W. Telford et al., Applied Geophysics, 2"' ed., Cambridge U. Press, 1990. 
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conduction, the electrostatic energy U is “dissipated” (i.e. transferred to heat) at a certain rate A = — 
dU/dt, with the dimensionality of power.!’? This so-called energy dissipation may be evaluated by 
calculating the power of the electric field’s work on a single moving charge: 


P=¥-v=qE-v. (4.37) 


After the summation over all charges, Eq. (37) gives us the average dissipation power. If the 
charge density n is uniform, multiplying by it both parts of this relation, and taking into account that gnv 
= j, for the energy dissipation in a unit volume we get the differential form of the Joule law!8 


(4.38) 


(4.39) 


With our electrostatics background, it is also straightforward (and hence left for the reader’s exercise) to 
prove that the de current distribution in a uniform Ohmic conductor, at a fixed voltage distribution along 
its borders, corresponds to the minimum of the total dissipation in the sample, 


P= pd*r =o] E'a’r. (4.40) 
V V 


4.5. Exercise problems 


R, R, 
4.1. DC voltage Vo is applied to the end of a semi-infinite 
chain of lumped Ohmic resistors, shown in the figure on the right. (,) 


R, 


Calculate the voltage across the j" link of the chain. 


4.2. It is well known that properties of many dc current sources (e.g., batteries) may be 
reasonably well represented as a connection in series of a perfect voltage source and an Ohmic internal 
resistance. Discuss the option, and possible advantages, of using a different equivalent circuit that 
would include a perfect current source. 


4.3. Prove the following Rayleigh-Lorentz-Carson reciprocity 
relation: Results of two reciprocal experiments shown schematically in the 
figure on the right, with an arbitrary Ohmic conductor with four 
electrodes/terminals, are related as V2 = hV. 


Hint: Try to apply the same approach as was used to prove Green’s 
reciprocity relation in electrostatics in Problem 1.18, but with proper 
modifications. 


!7 Tf this electric field and hence the electrostatic energy are time-independent, the energy is replenished at the 
same rate from the current source(s). 
18 Named after James Prescott Joule, who quantified this effect in 1841. 
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4.4. Calculate the resistance between two large, uniform Ohmic fox 
conductors separated by a very thin, plane, insulating partition, with a 
circular hole of radius R in it — see the figure on the right. 


Hint: You may like to use the oblate spheroidal coordinates which 2R 
were discussed in Sec. 2.4. 


4.5. A very narrow plane crack inside a round conducting wire of : oO 
radius R does not reach its surface by a small distance w — see the figure J 
on the right. Assuming that the Ohmic conductivity o of the wire’s 
material is otherwise constant, calculate the electric resistance of the 
obstacle in the first approximation in small w/R << 1. 


Hint: You may like to use the same elliptic coordinates as were OR 
employed at the solution of Problem 2.12. 


4.6. Calculate the effective (average) conductivity oe of a o 
medium with many empty spherical cavities of radius R, carved at 


random positions in a uniform Ohmic conductor (see the figure on the 
right), in the limit of a low density n << R® of the spheres. C) 


Hint: You may like to use the analogy with a dipole medium — 
see, e.g., Sec. 3.2. 


4.7. In two separate experiments, a narrow gap, possibly of irregular width, between two close, 
perfectly conducting electrodes is filled with some material: in the first case, with a uniform linear 
dielectric with an electric permittivity ¢, and in the second case, with a uniform conducting material with 
an Ohmic conductivity o. Neglecting the fringe effects, calculate the relation between the mutual 
capacitance C between the electrodes (in the first case) and the dc resistance R between them (in the 


second case). 
fb 


4.8. Calculate the voltage V across a uniform, wide resistive 
slab of thickness ¢, at distance / from the points of injection/pickup + \ : l s|v=9 
of the de current J passed across the slab — see the figure on the right. 


4.9. Calculate the distribution of the dc current’s density in a thin, round, uniform resistive disk, 
if the current is inserted into some point at its rim, and picked up at the center. 


4.10. DC current is passed between two point electrodes \e 2 
connected to a wide, thin, uniform resistive sheet — see the figure on \ J 
the right. Use the model solution of the previous problem to prove, 
without much new calculation, that cutting a round hole in the sheet 
(outside of the current injection/extraction points) doubles the voltage 
between any two points on it border. 
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4.11. The rim of a hemispherical thin shell, of radius R and 
thickness t << R, made of a uniform Ohmic conductor, is connected to a 
plane grounded electrode. Calculate the distribution of the electrostatic 
potential created in the shell by a de current J injected into it through a 
small-size electrode located at a polar angle 0’ < 7/2 from the symmetry 
axis — see the figure on the right. 


Hint: You may like to use the variable substitution o = tan(@/2) to map the hemisphere onto a 
unit circle. 


4.12. A rectangle of area /xw is cut from a resistive sheet of 
thickness t << /, w. Use two different approaches to calculate the voltage i 
V between its two adjacent corners, induced by the dc current J passed 
between the two other corners — see the figure on the right. 


Hint: Besides the charge/current image method, you may like to 
consider using the variable separation method, with due respect to the 
current injection/extraction points. 


4.13. The simplest reasonable model of a vacuum diode consists of two plane, parallel metallic 
electrodes of area A, separated by a gap of thickness d << A'”: a “cathode” that emits electrons into the 
gap, and an “anode” that absorbs the electrons arriving from the gap at its surface. Calculate the dc /-V 
curve of the diode, i.e. the stationary relation between the current J flowing between the electrodes and 


the voltage V applied between them, using the following simplifying assumptions: 


(1) due to the effect of the negative space charge of the emitted electrons, the current J is much 
lower than the emission ability of the cathode, 

(11) the initial velocity of the emitted electrons is negligible, and 

(iii) the direct Coulomb interaction of electrons (besides the space charge effect) is negligible. 


4.14." Calculate the space-charge-limited current in a system with the same geometry as in the 
previous problem, and using the same assumptions besides that now the emitted charge carriers move 
not ballistically, but drift in accordance with the Ohm law, with the conductivity given by Eq. (13): o= 
q yn, with a constant mobility p.'9 


Hint: In order to get a realistic result, assume that the medium in that the charge carriers move 
has a certain dielectric constant « unrelated to the carriers. 


4.15. Prove that the distribution of de currents in a uniform Ohmic conductor with a given 
voltage distribution along its boundaries, corresponds to the minimum of the total energy dissipation 
rate (“Joule heat’). 


19 As was mentioned in Sec. 2, the approximation of a constant (in particular, field- and charge-density- 
independent) mobility is most suitable for semiconductors. 
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Chapter 5. Magnetism 


Even though this chapter addresses a completely new type of electric charge interaction, its discussion 
(for the stationary case) will take not too much time/space, because it recycles many ideas and methods 
of electrostatics, though with a twist or two. 


5.1. Magnetic interaction of currents 


DC currents in conductors usually leave them electroneutral, p(r) = 0, with very good precision, 
because even a minute imbalance of positive and negative charge density results in extremely strong 
Coulomb forces that restore the electroneutrality by a very fast additional shift of free charge carriers. 
This is why let us start the discussion of magnetism from the simplest case of two spatially-separated, 
dce-current-carrying, electroneutral conductors (Fig. 1). 


Fig. 5.1. Magnetic 
interaction of two 
currents. 


According to the Coulomb law, there is no electrostatic force between them. However, several 
experiments carried out in 1820! proved that there is a different, magnetic interaction between the 
currents. In the present-day notation, the results of all such experiments may be summarized with just 
one formula, in SI units expressed as 

r-r’ 


F= “alee [i(r) -j'@)] ——— (5.1) 


Port 


Here the coefficient s%/42 (where su is called either the magnetic constant or the free space 
permeability) equals to almost exactly 10°’ SI units, with the product 4 equal to exactly 1/c’.? 


Note a close similarity of this expression to the Coulomb law (1.1) rewritten for the interaction 
of two continuously distributed charges, with the account of the linear superposition principle (1.4): 
1 7a f 
F=——fd?r[d°r' pin)p'e) ——; (5.2) 
AME,) ps |r—r’ 


3° 


' Most notably, by Hans Christian Orsted who discovered the effect of electric currents on magnetic needles, and 
André-Marie Ampére who extended this work by finding the magnetic interaction between two currents. 

? For details, see Appendix CA: Selected Physical Constants. In the Gaussian units, the coefficient fuo/47 in Eq. 
(1) and beyond is replaced with 1/c’. 
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Besides the different coefficient and a different sign, the “only” difference of Eq. (1) from Eq. (2) is the 
scalar product of the current densities, evidently necessary because of their vector character. We will see 
soon that this difference brings certain complications in applying the approaches discussed in the 
previous chapters, to magnetostatics. 


Before going to their discussion, let us have one more glance at the coefficients in Eqs. (1) and 
(2). To compare them, let us consider two objects with uncompensated charge distributions p(r) and 
p(r), each moving parallel to each other as a whole, with certain velocities v and v’, as measured in the 
same inertial (“laboratory”) reference frame. In this case, j(r) = p(r)v, so that j(r)-j’(r) = pre (n)w’, 
and the integrals in Eqs. (1) and (2) become functionally similar, differing only by the factor 


F 4n | 4zé,) @ 


electric 


(The last expression is valid in any consistent system of units.) We immediately see that the magnetism 
is an essentially relativistic phenomenon, very weak in comparison with the electrostatic interaction at 
the human scale velocities, v << c, and may dominate only if the latter interaction vanishes — as it does 
in electroneutral systems.? The discovery and initial studies* of such a subtle, relativistic phenomenon as 
magnetism were much facilitated by the relative abundance of natural ferromagnets: materials with a 
spontaneous magnetic polarization, whose strong magnetic field is due to relativistic effects (such as 
spin) inside the constituent atoms — see Sec. 5 below. 


Also, Eq. (3) points to an interesting paradox. Consider two electron beams moving parallel to 
each other, with the same velocity v with respect to a lab reference frame. Then, according to Eq. (3), 
the net force of their total (electric plus magnetic) interaction is proportional to (1 — v’/c’), tending to 
zero in the limit v + c. However, in the reference frame moving together with the electrons, they are not 
moving at all, i.e. v= 0. Hence, from the point of view of such a moving observer, the electron beams 
should interact only electrostatically, with a repulsive force independent of the velocity v. Historically, 
this had been one of several paradoxes that led to the development of special relativity; its resolution 
will be discussed in Chapter 9 devoted to this theory. 


Returning to Eq. (1), in some simple cases the double integration in it may be carried out 
analytically. First of all, let us simplify this expression for the case of two thin, long conductors 
(“wires”) separated by a distance much larger than their thickness. In this case, we may integrate the 
products jd’r and jd°r’ over the wires’ cross-sections first, neglecting the corresponding change of the 
factor (r — r’). Since the integrals of the current density over the cross-sections of the wires are just the 
currents J and J’ flowing in the wires, and cannot change along their lengths (say, / and /’, respectively), 
they may be taken out of the remaining integrals, reducing Eq. (1) to 


Moll rr 
F = -—— 6 6(ar - dr' )}——_ 3.4 
An Pp ear oe) 


3 An important case when the electroneutrality may not hold is the motion of electrons in vacuum. (However, in 
this case, the electron speed is often comparable with the speed of light, so that the magnetic forces may be 
comparable in strength with electrostatic forces, and hence important.) Local violations of electroneutrality also 
play an important role in some semiconductor devices — see, e.g., SM Chapter 6. 

4 The first detailed book on this subject, De Magnete by William Gilbert (a.k.a. Gilberd), was published as early 
as 1600. 
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As the simplest example, consider two straight, parallel wires (Fig. 2) separated by distance d, 
both with length / >> d. 


Fig. 5.2. The magnetic force between 
two straight parallel currents. 


dr’ xX 


Due to the symmetry of this system, the vector of the magnetic interaction force has to: 


(i) lie in the same plane as the currents, and 
(11) be normal to the wires — see Fig. 2. 


Hence we may limit our calculations to just one component of the force — normal to the wires. Using the 
fact that with the coordinate choice shown in Fig. 2, the scalar product dr-dr’ is just dxdx’, we get 

TI' +00 +00 sin fA) TI' +00 +00 d 
Ho | x | ~=- Ho as =: 
4n d° +(x-x’) An ld 4 (x—x')?| 
Now introducing, instead of x’, a new, dimensionless variable ¢ = (x — x’)/d, we may reduce the internal 
integral to a table one, which we have already encountered in this course: 


F=- 


(5.5) 


 wl+e?)? = 2nd 


pocke fax f as Li fac. (5.6) 


The integral over x formally diverges, but it gives a finite interaction force per unit length of the wires: 


II' 
Be (5.7) 
l 2nd 
Note that the force drops rather slowly (only as 1/d) as the distance d between the wires is increased, 
and is attractive (rather than repulsive as in the Coulomb law) if the currents are of the same sign. 


This is an important result,> but again, the problems so simply solvable are few and far between, 
and it is intuitively clear that we would strongly benefit from the same approach as in electrostatics, i.e., 
from decomposing Eq. (1) into a product of two factors via the introduction of a suitable field. Such 
decomposition may be done as follows: 


Lorentz 
force: 
current 


F= Ji@xBw)d*r, (5.8) 


5 In particular, until very recently (2018), Eq. (7) was used for the legal definition of the SI unit of current, the 
ampere (A), via the SI unit of force (the newton, N), with the coefficient 4% considered exactly fixed. (A brief 
description of the recent changes in legal metrology is given in Appendix CA.) 
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where the vector B is called the magnetic field.6 In the case when it is induced by the current j’: 


Bor) = fie) os , (5.9) 


|r-r 


’ 
| 3 


The last relation is called the Biot-Savart law,’ while the force F expressed by Eq. (8) is sometimes 
called the Lorentz force.’ However, more frequently the latter term is reserved for the full force, 


F = q(E+vxB), (5.10) 


exerted by electric and magnetic fields field on a point charge g, moving with velocity v.? 


Now we have to prove that the new formulation, given by Eqs. (8)-(9), is equivalent to Eq. (1). 
At the first glance, this seems unlikely. Indeed, first of all, Eqs. (8) and (9) involve vector products, 
while Eq. (1) is based on a scalar product. More profoundly, in contrast to Eq. (1), Eqs. (8) and (9) do 
not satisfy the 3"! Newton’s law applied to elementary current components jd’r and j’d'r’, if these 
vectors are not parallel to each other. Indeed, consider the situation shown in Fig. 3. 


Fig. 5.3. The apparent violation of the 
3" Newton law in magnetism. 


Here the vector j’ is perpendicular to the vector (r — r’), and hence, according to Eq. (9), 
produces a non-zero contribution dB’ to the magnetic field directed (in Fig. 3) normally to the plane of 
the drawing, i.e. perpendicular to the vector j. Hence, according to Eq. (8), this field provides a non-zero 
contribution to F. On the other hand, if we calculate the reciprocal force F’ by swapping the prime 
indices in Eqs. (8) and (9), the latter equation immediately shows that dB(r’) « jx(r’ — r) = 0, because 
the two operand vectors are parallel — see Fig. 3 again. Hence, the current component j ’d°r’ does exert a 
force on its counterpart, while jd’r does not. 


© The SI unit of the magnetic field is called tes/a (T) — after Nikola Tesla, a pioneer of electrical engineering. In 
the Gaussian units, the already discussed constant 1/c* in Eq. (1) is equally divided between Eqs. (8) and (9), so 
that in them both, the constant before the integral is 1/c. The resulting Gaussian unit of the field B is called gauss 
(G); taking into account the difference of units of electric charge and length, and hence of the current density, 1 G 
equals exactly 10% T. Note also that in some textbooks, especially old ones, B is called either the magnetic 
induction or the magnetic flux density, while the term “magnetic field” is reserved for the field H that will be 
introduced in Sec. 5 below. 

7 Named after Jean-Baptiste Biot and Félix Savart who made several key contributions to the theory of magnetic 
interactions — in the same notorious 1820. 

8 Named after Hendrik Antoon Lorentz, famous mostly for his numerous contributions to the development of 
special relativity — see Chapter 9 below. To be fair, the magnetic part of the Lorentz force was implicitly 
described in a much earlier (1865) paper by J. C. Maxwell, and then spelled out by Oliver Heaviside (another 
genius of electrical engineering — and mathematics!) in 1889, i.e. also before the 1895 work by H. Lorentz. 

° From the magnetic part of Eq. (10), Eq. (8) may be derived by the elementary summation of all forces acting on 
n >> | particles in a unit volume, with j = gnv — see the footnote on Eq. (4.13a). On the other hand, the reciprocal 
derivation of Eq. (10) from Eq. (8) with j = gvo(r — ro), where ro is the current particle’s position (so that dro/dt = 
v), requires certain mathematical care and will be performed in Chapter 9. 
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Despite this apparent problem, let us still go ahead and plug Eq. (9) into Eq. (8): 


r 


po Beeld gor] i (5.11) 


-r 
Ir —r’ 
This double vector product may be transformed into two scalar products, using the vector algebraic 
identity called the bac minus cab rule, ax(bxc) = b(a-c) — e(a-b).!° Applying this relation, with a = j, b 
=j’,ande=R=r-r’, to Eq. (11), we get 


Ho Be psn ash 3 j(r)-R Ho 3 3 ye Saget 
=H faery fa AOR) tela rar ie-ie aa (5.12) 
The second term on the right-hand side of this equality coincides with the right-hand side of Eq. (1), 
while the first term equals zero because its internal integral vanishes. Indeed, we may break the volumes 
V and V’ into narrow current tubes — the stretched elementary volumes whose walls are not crossed by 
current lines (so that on their walls, 7,, = 0). As a result, the elementary current in each tube, d/ = jdA = 
jd’r, is the same along its length, and, just as in a thin wire, jd’r may be replaced with d/dr, with the 
vector dr directed along j. Because of this, each tube’s contribution to the internal integral in the first 
term of Eq. (12) may be represented as 


1 


aifde p= -dlfde V7 aA 


or R’ 


= dlp dr (5.13) 
1 

where the operator V acts in the r-space, and the integral is taken along the tube’s length /. Due to the 

current continuity expressed by Eq. (4.6), each loop should follow a closed contour, and an integral of a 

full differential of some scalar function (in our case, of 1/R) along such contour equals zero. 


So we have recovered Eq. (1). Returning for a minute to the paradox illustrated in Fig. 3, we may 
conclude that the apparent violation of the 3"' Newton law was the artifact of our interpretation of Eqs. 
(8) and (9) as the sums of independent elementary components. In reality, due to the dc current 
continuity, these components are not independent. For the whole currents, Eqs. (8)-(9) do obey the 3" 
law — as follows from their already proved equivalence to Eq. (1). 


Thus it is possible to break the magnetic interaction into two effects: the induction of the 
magnetic field B by one current (in our notation, j’), and the effect of this field on the other current (j). 
Now comes an additional experimental fact: other elementary components jd°r’ of the current j(r) also 
contribute to the magnetic field (9) acting on the component jd’r.!! This fact allows us to drop the prime 
sign after j in Eq. (9), and rewrite Eqs. (8) and (9) as 


_ Ho i(r’) x r=r 30 
B(r) = ani orf ar, (5.14) 
F =| i@r)xB(r)d’r. (5.15) 


10 See, e.g., MA Eq. (7.5). 

'l Just as in electrostatics, one needs to exercise due caution in transforming these expressions for the limit of 
discrete classical particles, and extended wavefunctions in quantum mechanics, to avoid the (non-existing) 
magnetic interaction of a charged particle with itself. 
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Again, the field observation point r and the field source point r’ have to be clearly distinguished. We 
immediately see that these expressions are close to, but still different from the corresponding relations of 
the electrostatics, namely Eq. (1.9) and the distributed-charge version of Eq. (1.6): 


E(r) -—— for’ a d?r', (5.16) 
0 r— 
F=$(r)E(r)d’r . (5.17) 


(Note that the sign difference has disappeared, at the cost of the replacement of scalar-by-vector 
multiplications in electrostatics with cross-products of vectors in magnetostatics.) 


For the frequent particular case of a thin wire of length /’, Eq. (14) may be re-written as 


r-r'’ 


Mol 
B(r) = — oar’ . 5.18 
(r) Pe : (5.18) 


|r-r’ 


Let us see how this formula works for the simplest case of a straight wire (Fig. 4a). The magnetic field 
contributions dB due to all small fragments dr’ of the wire’s length are directed along the same line 
(perpendicular to both the wire and the shortest distance d from the observation point to the wire’s line), 
and its magnitude is 


B _ Hol dx' dx' d 


. I 
sin = 2 


: 5.19 
An lr —r'|” An (a? +7) (a2 +42)” 
Summing up all such elementary contributions, we get 
Mlp ¢ dx Mol 
B= — : 5.20 
Ar le +d°y 2nd vee 


(b) 


Fig. 5.4. Calculating magnetic fields: (a) of a straight current, and (b) of a current loop. 


This is a simple but important result. (Note that it is only valid for very long (/ >> d), straight 
wires.) It is especially crucial to note the “vortex” character of the field: its lines go around the wire, 
forming rings with the centers on the current line. This is in sharp contrast to the electrostatic field lines, 
which can only begin and end on electric charges and never form closed loops (otherwise the Coulomb 
force gE would not be conservative). In the magnetic case, the vortex structure of the field may be 
reconciled with the potential character of the magnetic forces, which is evident from Eq. (1), due to the 
vector products in Eqs. (14)-(15). 
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Now we may readily use Eq. (15), or rather its thin-wire version 
F = [fdrxB(r), (5.21) 
1 


to apply Eq. (20) to the two-wire problem (Fig. 2). Since for the second wire, the vectors dr and B are 
perpendicular to each other, we immediately arrive at our previous result (7), which was obtained 
directly from Eq. (1). 


The next important example of the application of the Biot-Savart law (14) is the magnetic field at 
the axis of a circular current loop (Fig. 4b). Due to the problem’s symmetry, the net field B has to be 
directed along the axis, but each of its elementary components dB is tilted by the angle 6 = tan’'(z/R) to 
this axis, so that its axial component is 

I 
dB. = dBcos0 = is sf 


——_ --——_———_——. wi 22 
An R* +2? (R? 4.27)" ( ) 


Since the denominator of this expression remains the same for all wire components dr’, the integration 
over r’ is easy (Jdr’ = 22R), giving finally 


Mol R* 
= ; G ey" ; (5.23) 
Note that the magnetic field in the loop’s center (1.e., for z = 0), 
B= se (5.24) 


is z times higher than that due to a similar current in a straight wire, at distance d = R from it. This 
difference is readily understandable, since all elementary components of the loop are at the same 
distance R from the observation point, while in the case of a straight wire, all its points but one are 
separated from the observation point by distances larger than d. 


Another notable fact is that at large distances (z* >> R’), the field (23) is proportional to z°: 


~~ —_—_ = — with m=/A, (5.25) 


where A = aR? is the loop area. Comparing this expression with Eq. (3.13), for the particular case 0= 0, 
we see that such field is similar to that of an electric dipole (at least along its direction), with the 
replacement of the electric dipole moment magnitude p with the m so defined — besides the front factor. 
Indeed, such a plane current loop is the simplest example of a system whose field, at distances much 
larger than R, is that of a magnetic dipole, with a dipole moment m — the notions to be discussed in 
much more detail in Sec. 4 below. 


5.2. Vector potential and the Ampére law 


The reader could see that the calculations of the magnetic field using Eq. (14) or (18) are still 
somewhat cumbersome even for the very simple systems we have examined. As we saw in Chapter 1, 
similar calculations in electrostatics, at least for several important highly symmetric systems, could be 
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substantially simplified using the Gauss law (1.16). A similar relation exists in magnetostatics as well, 
but has a different form, due to the vortex character of the magnetic field. 


To derive it, let us notice that in an analogy with the scalar case, the vector product under the 
integral (14) may be transformed as 


J@)x@=-1) _ oy i) (5.26) 


where the operator V acts in the r-space. (This equality may be readily verified by Cartesian 
components, noticing that the current density is a function of r’ and hence its components are 
independent of r.) Plugging Eq. (26) into Eq. (14), and moving the operator V out of the integral over r’, 
we see that the magnetic field may be represented as the curl of another vector field — the so-called 


vector potential, defined as:!2 
Bir) = Vx A(r), (5.27) 


A() = 7 Mo = i) (5.28) 


Vector 
potential 


and in our current case equal to 


tire] 


Please note a beautiful analogy between Eqs. (27)-(28) and, respectively, Eqs. (1.33) and (1.38).!> This 
analogy implies that the vector potential A plays, for the magnetic field, essentially the same role as the 
scalar potential ¢ plays for the electric field (hence the name “potential”), with due respect to the vortex 
character of B. This notion will be discussed in more detail below. 


Now let us see what equations we may get for the spatial derivatives of the magnetic field. First, 
vector algebra says that the divergence of any curl is zero.!* In application to Eq. (27), this means that 


N 
monopoles 


Comparing this equation with Eq. (1.27), we see that Eq. (29) may be interpreted as the absence of a 
magnetic analog of an electric charge, on which magnetic field lines could originate or end. Numerous 
searches for such hypothetical magnetic charges, called magnetic monopoles, using very sensitive and 
sophisticated experimental setups, have not given any reliable evidence of their existence in Nature. 


Proceeding to the alternative, vector derivative of the magnetic field, i.e. to its curl, and using 
Eq. (28), we obtain 


VxB(r) = £2 x vx| U(r gy | (5.30) 
An ,(r—r'| 


This expression may be simplified by using the following general vector identity:!> 
Vx(Vxe)=V(V-c)-V’e, (5.31) 

applied to vector e(r) = j(r’)/|r —r |: 

12 Tn the Gaussian units, Eq. (27) remains the same, and hence in Eq. (28), 4/4 is replaced with 1/c. 

13 In Eq. (1.38), there was no real need for the additional clarification provided by the integration volume label V’. 


14 See, e.g., MA Eq. (11.2). 
15 See, e.g., MA Eq. (11.3). 
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1 
VxB=“2V{j0r)-V dr —- fir? d?r'. (5.32) 
An 3, jr—r’ An 4, jr—r’ 
As was already discussed during our study of electrostatics in Sec. 3.1, 
1 
y 476(r —-r'), (5.33) 
| r-r’ 


so that the last term of Eq. (32) is just j(r). On the other hand, inside the first integral we can replace 
V with (-V’), where prime means differentiation in the space of the radius vectors r’. Integrating that 
term by parts, we get 

1 


jr-r’ 


Ho ° ’ 
VxB=-—V r 
rn pal ) 


rey [TA r+ ite), (5.34) 
y" r-r 


Applying this equality to the volume V’ limited by a surface S’ either sufficiently distant from the field 
concentration, or with no current crossing it, we may neglect the first term on the right-hand side of Eq. 
(34), while the second term always equals zero in statics, due to the dc charge continuity — see Eq. (4.6). 
As aresult, we arrive at a very simple differential equation!® 


VxB=4,j. (5.35) 


This is (the de form of) the inhomogeneous Maxwell equation — which in magnetostatics plays a 
role similar to Eq. (1.27) in electrostatics. Let me display, for the first time in this course, this 
fundamental system of equations (at this stage, for statics only), and give the reader a minute to stare, in 
silence, at their beautiful symmetry — which has inspired so much of the later development of physics: 


(5.36) 


Their only asymmetry, two zeros on the right-hand sides (for the magnetic field’s divergence and 
electric field’s curl), is due to the absence in the Nature of magnetic monopoles and their currents. I will 
discuss these equations in more detail in Sec. 6.7, after the first two equations (for the fields’ curls) have 
been generalized to their full, time-dependent versions. 


Returning now to our current, more mundane but important task of calculating the magnetic field 
induced by simple current configurations, we can benefit from an integral form of Eq. (35). For that, let 
us integrate this equation over an arbitrary surface S' limited by a closed contour C, and apply to the 
result the Stokes theorem.!7 The resulting expression, 


fB-dr = up j,d°r = p!, (5.37) 
Cc S 


where J is the net electric current crossing surface S, is called the Ampere law. 


16 As in all earlier formulas for the magnetic field, in the Gaussian units, the coefficient jw in this relation is 
replaced with 47/c. 
'7 See, e.g., MA Eq. (12.1) with f = B. 
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As the first example of its application, let us return to the current in a straight wire (Fig. 4a). 
With the Ampére law in our arsenal, we can readily pursue an even more ambitious goal than that 
achieved in the previous section, namely to calculate the magnetic field both outside and inside of a wire 
of an arbitrary radius R, with an arbitrary (albeit axially-symmetric) current distribution j(p) — see Fig. 5. 


Fig. 5.5. The simplest application of the Ampere 
law: the magnetic field of a straight current. 


Selecting the Ampére-law contour C in the form of a ring of some radius p in the plane normal to 
the wire’s axis z, we have B-dr = Bo dg, where gis the azimuthal angle, so that Eq. (37) yields: 


p 
2a[i(p)pdp’, — for ps R, 
27 pB(p)=",x, 2 (5.38) 


R 
2x] j(p')p' de! =I, forp=R. 
0 


Thus we have not only recovered our previous result (20), with the notation replacement d > p, ina 
much simpler way but also have been able to calculate the magnetic field’s distribution inside the wire. 
In the most common particular case when the current is uniformly distributed along its cross-section, 
J(p) = const, the first of Eqs. (38) immediately yields B < p for p< R. 


Another important system is a straight, long solenoid (Fig. 6a), with dense winding: n?4 >> 1, 
where n is the number of wire turns per unit length, and A is the area of the solenoid’s cross-section. 


(b) 


Fig. 5.6. Calculating 
magnetic fields of (a) straight 
and (b) toroidal solenoids. 


From the symmetry of this problem, the longitudinal (in Fig. 6a, vertical) component B, of the 
magnetic field may only depend on the distance p of the observation point from the solenoid’s axis. First 
taking a plane Ampére contour C;, with both long sides outside the solenoid, we get B(x) — BA p1) = 0, 
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because the total current piercing the contour equals zero. This is only possible if B. = 0 at any p outside 
of the solenoid, provided that it is infinitely long.!8 With this result on hand, from the Ampere law 
applied to the contour C2, we get the following relation for the only (z-) component of the internal field: 


Bl=y,NI, (5.39) 


where JN is the number of wire turns passing through the contour of length /. This means that regardless 
of the exact position of the internal side of the contour, the result is the same: 


B= py aI =yuy,nl . (5.40) 


Thus, the field inside an infinitely long solenoid (with an arbitrary shape of its cross-section) is uniform; 
in this sense, a long solenoid is a magnetic analog of a wide plane capacitor, explaining why this system 
is so widely used in physical experiment. 


As should be clear from its derivation, the obtained results, especially that the field outside of the 
solenoid equals zero, are conditional on the solenoid length being very large in comparison with its 
lateral size. (From Eq. (25), we may predict that for a solenoid of a finite length /, the close-range 
external field is a factor of ~A//’ lower than the internal one.) A much better suppression of such 
“fringe” fields may be obtained using toroidal solenoids (Fig. 6b). The application of the Ampére law to 
this geometry shows that in the limit of dense winding (NV >> 1), there is no fringe field at all (for any 
relation between the two radii of the torus), while inside the solenoid, at distance p from the system’s 
axis, 

gece, (5.41) 
270 
We see that a possible drawback of this system for practical applications is that the internal field does 
depend on p, i.e. is not quite uniform; however, if the torus 1s relatively thin, this deficiency is minor. 


Next let us discuss a very important question: how can we solve the problems of magnetostatics 
for systems whose low symmetry does not allow getting easy results from the Ampere law? (The 
examples are of course too numerous to list; for example, we cannot use this approach even to reproduce 
Eq. (23) for a round current loop.) From the deep analogy with electrostatics, we may expect that in this 
case, we could calculate the magnetic field by solving a certain boundary problem for the field’s 
potential — in our current case, the vector potential A defined by Eq. (28). However, despite the 
similarity of this formula and Eq. (1.38) for ¢, which was noticed above, there is an additional issue we 
should tackle in the magnetic case — besides the obvious fact that calculating the vector potential 
distribution means determining three scalar functions (say, A, Ay, and A-), rather than just one (@). 


To reveal the issue, let us plug Eq. (27) into Eq. (35): 
Vx(VxA)= “oj, (5.42) 


and then apply to the left-hand side of this equation the same identity (31). The result is 


'8 Applying the Ampére law to a circular contour of radius p, coaxial with the solenoid, we see that the field 
outside (but not inside!) it has an azimuthal component B,, similar to that of the straight wire (see Eq. (38) above) 
and hence (at V >> 1) much weaker than the longitudinal field inside the solenoid — see Eq. (40). 
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V(V-A)-V’A= Hj. (5.43) 


On the other hand, as we know from electrostatics (please compare Eqs. (1.38) and (1.41)), the vector 
potential A(r) given by Eq. (28) has to satisfy a simpler (““vector-Poisson’’) equation 


VA=-i, (5.44) 


which is just a set of three usual Poisson equations for each Cartesian component of A. 


To resolve the difference between these results, let us note that Eq. (43) is reduced to Eq. (44) if 
V-A = 0. In this context, let us discuss what discretion we have in the choice of the potential. In 
electrostatics, we may add, to the scalar function @’ that satisfies Eq. (1.33) for the given field E, not 
only an arbitrary constant but even an arbitrary function of time: 


-V[¢'+ f@]=-V¢ =E. (5.45) 


Similarly, using the fact that the curl of the gradient of any scalar function equals zero,!® we may add to 
any vector function A’ that satisfies Eq. (27) for the given field B, not only any constant but even a 
gradient of an arbitrary scalar function 7(r, ft), because 


Vx(A't+ Vy) =VxA'+Vx (Vy) =VXA'=B. (5.46) 


Such additions, which keep the fields intact, are called gauge transformations.*° Let us see what such a 
transformation does to V-A’: 
V(A'HVy)=V-A'FV’y. (5.47) 


For any choice of such a function A’, we can always choose the function y in such a way that it satisfies 
the Poisson equation V’y =—V-A’, and hence makes the divergence of the transformed vector potential, 
A=A’+ Vy, equal to zero everywhere, 

V:A=0, (5.48) 
thus reducing Eq. (43) to Eq. (44). 


To summarize, the set of distributions A ’(r) that satisfy Eq. (27) for a given field B(r), is not 
limited to the vector potential A(r) given by Eq. (44), but is reduced to it upon the additional Coulomb 
gauge condition (48). However, as we will see in a minute, even this condition still leaves some degrees 
of freedom in the choice of the vector potential. To illustrate this fact, and also to get a better gut feeling 
of the vector potential’s distribution in space, let us calculate A(r) for two very basic cases. 


First, let us revisit the straight wire problem shown in Fig. 5. As Eq. (28) shows, in this case the 
vector potential A has just one component (along the axis z). Moreover, due to the problem’s axial 
symmetry, its magnitude may only depend on the distance from the axis: A = n.A(p). Hence, the 
gradient of A is directed across the z-axis, so that Eq. (48) is satisfied at all points. For our symmetry 
(0/Og = O/dz = 0), the Laplace operator, written in cylindrical coordinates, has just one term,?! reducing 
Eq. (44) to 


19 See, e.g., MA Eq. (11.1). 

20 The use of the term “gauge” (originally meaning ‘‘a measure” or “‘a scale’’) in this context is purely historic, so 
the reader should not try to find too much hidden sense in it. 

21 See, e.g., MA Eq. (10.3). 
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\ ee dA 
[> ) =—Myj(p)- (5.49) 
pdp\ dp 
Multiplying both sides of this equation by and integrating them over the coordinate once, we get 
dA i * r ’ 
Pap tel ie \p'dp' + const . (5.50) 
0 


Since in the cylindrical coordinates, for our symmetry, B = —dA/dp,?* Eq. (50) is nothing else than our 
old result (38) for the magnetic field.2? However, let us continue the integration, at least for the region 
outside the wire, where the function A(p) depends only on the full current 7 rather than on the current 
distribution. Dividing both parts of Eq. (50) by p, and integrating them over this argument again, we get 


T R 
A(p)=—F*Inp + const, where I =2n{ j(p)pdp, for p=R. (5.51) 
a 0 


As a reminder, we had similar logarithmic behavior for the electrostatic potential outside a uniformly 
charged straight line. This is natural because the Poisson equations for both cases are similar. 


Now let us find the vector potential for the long solenoid (Fig. 6a), with its uniform magnetic 
field. Since Eq. (28) tells us that the vector A should follow the direction of the inducing current, we 
may start with looking for it in the form A = n, A(p). (This is especially natural if the solenoid’s cross- 
section is circular.) With this orientation of A, the same general expression for the curl operator in 
cylindrical coordinates yields VxA = n{1/p)d(eA)/dp. According to Eq. (27), this expression should be 
equal to B — in our current case to n,B, with a constant B — see Eq. (40). Integrating this equality, and 
selecting such integration constant that A(0) is finite, we get 


B B 
A(p)= >, i.e. A=—*n,. 6.52) 


Plugging this result into the general expression for the Laplace operator in the cylindrical coordinates,?* 
we see that the Poisson equation (44) with j = 0 (i.e. the Laplace equation) is satisfied again — which is 
natural since, for this distribution, the Coulomb gauge condition (48) is satisfied: V-A = 0. 


However, Eq. (52) is not the unique (or even the simplest) vector potential that gives the same 
uniform field B = n,B. Indeed, using the well-known expression for the curl operator in Cartesian 
coordinates,?> it is straightforward to check that each of the vector functions A’ = n,Bx and A”’= —n,By 
also has the same curl, and also satisfies the Coulomb gauge condition (48).2° If such solutions do not 
look very natural because of their anisotropy in the [x, y] plane, please consider the fact that they 
represent the uniform magnetic field regardless of its source — for example, regardless of the shape of 
the long solenoid’s cross-section. Such choices of the vector potential may be very convenient for some 


22 See, e.g., MA Eq. (10.5) with 0/Og = 0/dz = 0. 

23 Since the magnetic field at the wire’s axis has to be zero (otherwise, being normal to the axis, where would it 
be directed?), the integration constant in Eq. (50) has to equal zero. 

24 See, e.g., MA Eq. (10.6). 

25 See, e.g., MA Eq. (8.5). 

26 The axially-symmetric vector potential (52) is just a weighted sum of these two functions: A= (A’+ A”)/2. 
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problems, for example for the quantum-mechanical analysis of the 2D motion of a charged particle in 
the perpendicular magnetic field, giving the famous Landau energy levels.?7 


5.3. Magnetic energy, flux, and inductance 


Considering the currents flowing in a system as generalized coordinates, the magnetic forces (1) 
between them are their unique functions, and in this sense, the energy U of their magnetic interaction 
may be considered the potential energy of the system. The apparent (but somewhat deceptive) way to 
derive an expression for this energy is to use the analogy between Eq. (1) and its electrostatic analog, 
Eq. (2). Indeed, Eq. (2) may be transformed into Eq. (1) with just three replacements: 


(i) A(r)e’(r’) should be replaced with [j(r)-j (r J], 
(11) & should be replaced with 1/1, and 
(111) the sign before the double integral has to be replaced with the opposite one. 


Hence we may avoid repeating the calculation made in Chapter 1, by making these replacements in Eq. 
(1.59), which gives the electrostatic potential energy of the system with p(r) and p’(r’) describing the 
same charge distribution, i.e. with p’(r) = p(r), to get the following expression for the magnetic 
potential energy in the system with, similarly, j’(r) = j(r):78 


_ FO * 3 3 , Jr) - jr) j@’) 
3 d r[d’r =. (5.53) 


But this is not the unique answer! Indeed, Eq. (53) describes the proper potential energy of the 
system (in particular, giving the correct result for the current interaction forces) only in the case when 
the interacting currents are fixed — just as Eq. (1.59) is adequate when the interacting charges are fixed. 
Here comes a substantial difference between electrostatics and magnetostatics: due to the fundamental 
fact of electric charge conservation (already discussed in Secs. 1.1 and 4.1), keeping electric charges 
fixed does not require external work, while the maintenance of currents generally does. As a result, Eq. 
(53) describes the energy of the magnetic interaction plus of the system keeping the currents constant — 
or rather of its part depending on the system under our consideration. In this situation, using the 
terminology already used in Sec. 3.5 (see also a general discussion in CM Sec. 1.4.), U; may be called 
the Gibbs potential energy of our magnetic system. 


Now to exclude from U; the contribution due to the interaction with the current-supporting 
system(s), i.e. calculate the potential energy U of our system as such, we need to know this contribution. 
The simplest way to do this is to use the Faraday induction law that describes this interaction and will 
be discussed at the beginning of the next chapter. This is why let me postpone the derivation until that 
point, and for now ask the reader to believe me that the removal of the interaction leads to an expression 
similar to Eq. (53), but with the opposite sign: 


fo far fa’r , Jr): jr’) a (5.54) 


27 See, e.g., QM Sec. 3.2. 
28 Just as in electrostatics, for the interaction of two independent current distributions j(r) and j ‘(1r’), the factor 4 
should be dropped. 
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I will prove this result in Sec. 6.2, but actually, this sign dichotomy should not be quite surprising to the 
attentive reader, in the context of a similar duality of Eqs. (3.73) and (3.81) for the electrostatic energies 
including and excluding the interaction with the field source. 


Due to the importance of Eq. (54), let us rewrite it in several other forms, convenient for various 
applications. First of all, just as in electrostatics, it may be recast into a potential-based form. Indeed, 
with the definition (28) of the vector potential A(r), Eq. (54) becomes 


ia : 
U =5]i@-ac@a r. (5.55) 


This formula, which is a clear magnetic analog of Eq. (1.60) of electrostatics, is very popular among 
field theorists, because it is very handy for their manipulations; it is also useful for some practical 
applications. However, for many calculations, it is more convenient to have a direct expression of the 
energy via the magnetic field. Again, this may be done very similarly to what had been done for 
electrostatics in Sec. 1.3, i.e. by plugging into Eq. (55) the current density expressed from Eq. (35) and 
then transforming it as? 


1 1 1 1 
U =—|j-Ad’*r =——| A-(VxB)d’r = —|B-(Vx A)d’r -——|V-(AxB)d?r. (5.56) 
3! 2a ( 2a ( 2a 


Now using the divergence theorem, the second integral may be transformed into a surface integral of 
(AxB),. According to Eqs. (27)-(28) if the current distribution j(r) is localized, this vector product 
drops, at large distances, faster than 1/r’, so that if the integration volume is large enough, the surface 
integral is negligible. In the remaining first integral in Eq. (56), we may use Eq. (27) to rewrite VxA as 
B. As a result, we get a very simple and fundamental formula. 


ee | B'd’r. (5.57a) 
2 Ly 


Just as with the electric field, this expression may be interpreted as a volume integral of the magnetic 
energy density u: 


u(r)d°*r, with u(r)=——B?(r), (5.57b) 


2 Ly 


clearly similar to Eq. (1.65).2° Again, the conceptual choice between the spatial localization of magnetic 
energy — either at the location of electric currents only, as implied by Eqs. (54) and (55), or in all regions 
where the magnetic field exists, as apparent from Eq. (57b), cannot be done within the framework of 
magnetostatics, and only the electrodynamics gives a decisive preference for the latter choice. 


For the practically important case of currents flowing in several thin wires, Eq. (54) may be first 
integrated over the cross-section of each wire, just as was done at the derivation of Eq. (4). As before, 
since the integral of the current density over the k" wire's cross-section is just the current J; in the wire, 
and cannot change along its length, it may be taken from the remaining integrals, giving 


29 For that, we may use MA Eq. (11.7) with f= A and g = B, giving A-(VxB) = B-(VxA) — V-(AxB). 
30 The transfer to the Gaussian units in Eqs. (57) may be accomplished by the usual replacement /v — 47, thus 
giving, in particular, u = B’/8z. 
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dr, - dr, 


= 5.58 
“tea aL ( ) 


where /; is the full length of the k"" wire loop. Note that Eq. (58) is valid if all currents J; are independent 
of each other, because the double sum counts each current pair twice, compensating the coefficient 2 in 
front of the sum. It is useful to decompose this relation as 


1 
U= 5 ai tikis (5.59) 
k,k' 


where the coefficients L;: are independent of the currents: 


Mutual 
(5 .60) inductance 


coefficients 


The coefficient Lj: with k # k’, is called the mutual inductance between current the k" and k”™ 
loops, while the diagonal coefficient Ly = Ly is called the self-inductance (or just inductance) of the k" 
loop.3! From the symmetry of Eq. (60) with respect to the index swap, k <> k’, it is evident that the 
matrix of coefficients L,,: is symmetric:*2 
Lig = LEK» (5.61) 


so that for the practically most important case of two interacting currents J; and J), Eq. (59) reads 
Lg Le pe 
GSS re (5.62) 


where M = Lj2 = Lo, is the mutual inductance coefficient. 


These formulas clearly show the importance of the self- and mutual inductances, so I will 
demonstrate their calculation for at least a few basic geometries. Before doing that, however, let me 
recast Eq. (58) into one more form that may facilitate such calculations. Namely, let us notice that for 
the magnetic field induced by current J; in a thin wire, Eq. (28) is reduced to 


Ho dr, 
=—I] 
A,(r) = dpe : 


(5.63) 


so that Eq. (58) may be rewritten as 
1 
U=—S1, Apt.) ary . (5.64) 


But according to the same Stokes theorem that was used earlier in this chapter to derive the Ampére law, 
and Eq. (27), the integral in Eq. (64) is nothing else than the magnetic field’s flux ® (more frequently 
called just the magnetic flux) through a surface S limited by the contour / : 


3! As evident from Eq. (60), these coefficients depend only on the geometry of the system. Moreover, in the 
Gaussian units, in which Eq. (60) is valid without the factor 4/42, the inductance coefficients have the dimension 
of length (centimeters). The SI unit of inductance is called the henry, abbreviated H — after Joseph Henry, who in 
particular discovered the effect of electromagnetic induction (see Sec. 6.1) independently of Michael Faraday. 

32 Note that the matrix of the mutual inductances L, is very much similar to the matrix of reciprocal capacitance 
coefficients pxx'— for example, compare Eq. (62) with Eq. (2.21). 
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(5.65) 


— in the particular case of Eq. (64), the flux ®;,: of the field induced by the k’ " current through the loop 
of the k"" current.33 As a result, Eq. (64) may be rewritten as 


1 
U = 5 LP (5.66) 
k,k' 
Comparing this expression with Eq. (59), we see that 
Dy = [(By),d77 =Lylys (5.67) 
Sk 


This expression not only gives us one more means for calculating the coefficients Lj,’, but also 
shows their physical sense: the mutual inductance characterizes what part of the magnetic flux 
(colloquially, “what fraction of field lines”) induced by the current J, pierces the k"" loop’s area S; — see 
Fig. 7. 


Fig. 5.7. The physical sense of 
the mutual inductance coefficient 


oe a Lx = Dix /T, — schematically. 


S; 


Due to the linear superposition principle, the total flux piercing the k" loop may be represented as 


0, = Dox: = Dele (5.68) 
K K 


For example, for the system of two currents, this expression is reduced to a clear analog of Eqs. (2.19): 


®, =L,1,+M,, 


(5.69) 
®, =MI,+L,1,. 


For the even simpler case of a single current, 


so that the magnetic energy of the current may be represented in several equivalent forms: 


33 The SI unit of magnetic flux is called weber, abbreviated Wb — after Wilhelm Edward Weber (1804-1891), who 
in particular co-invented (with Carl Gauss) the electromagnetic telegraph. More importantly for this course, in 
1856 he was the first (together with Rudolf Kohlrausch) to notice that the value of (in modern terms) 1/(as00)"”, 
derived from electrostatic and magnetostatic measurements, coincides with the independently measured speed of 


light c. This observation gave an important motivation for Maxwell’s theory. 
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Li, 1 I 4 she 
U=—I’ =—]ld=—©®’. (5.71) _ single 
2 2, 2L current 


These relations, similar to Eqs. (2.14)-(2.15) of electrostatics, show that the self-inductance L of a 
current loop may be considered a measure of the system’s magnetic energy. However, as we will see in 
Sec. 6.1, this measure is adequate only if the flux ®, rather than the current J, is fixed. 


Now we are well equipped for the calculation of inductance coefficients for particular systems, 
having three options. The first one is to use Eq. (60) directly.34 The second one is to calculate the 
magnetic field energy from Eq. (57) as the function of all currents /;, in the system, and then use Eq. (59) 
to find all coefficients Lj’. For example, for a system with just one current, Eq. (71) yields 


i 

ie f2 
Finally, if the system consists of thin wires, so that the loop areas S; and hence the fluxes D;;: are well 
defined, we may calculate them from Eq. (65), and then use Eq. (67) to find the inductances. 


(5.72) 


Usually, the third option is simpler, but the first two one may be very useful even for thin-wire 
systems, if the notion of magnetic flux in them is not quite apparent. As an important example, let us 
find the self-inductance of a long solenoid — see Fig. 6a again. We have already calculated the magnetic 
field inside it — see Eq. (40) — so that, due to the field uniformity, the magnetic flux piercing each turn of 
the wire is just 

®, = BA=yy nlA, (5:73) 


where A is the area of the solenoid’s cross-section — for example zR* for a round solenoid, though Eq. 
(40), and hence Eq. (73) are valid for cross-sections of any shape. Comparing Eqs. (73) with Eq. (70), 
one might wrongly conclude that L = ®,/7J = snd (WRONG!), Le. that the solenoid’s inductance is 
independent of its length. Actually, the magnetic flux ®; pierces each wire turn, so that the total flux 
through the whole current loop, consisting of N turns, is 


D=NO, = y,n'lAl, (5.74) 


and the correct expression for the long solenoid’s self-inductance is 


co) N°A Lof 
L= ra = Myn'lA = — (5:19) nln 


i.e. at fixed A and /, the inductance scales as N’, not as N. Since this reasoning may seem not quite 

evident, it is prudent to verify this result by using Eq. (72), with the full magnetic energy inside the 

solenoid (neglecting minor fringe field contributions), given by Eq. (57) with B = const within the 
internal volume V = /A, and zero outside of it: 

ee ey | eee 

2 My 2 My 


2 
(unl) Al= nla. (5.76) 


Plugging this relation into Eq. (72) immediately confirms the result (75). 


34 Numerous applications of that Neumann formula (derived in 1845 by F. Neumann) to electrical engineering 
problems may be found, for example, in the classical text by F. Grover, Inductance Calculations, Dover, 1946. 
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This energy-based approach becomes virtually inevitable for continuously distributed currents. 
As an example, let us calculate the self-inductance Z of a long coaxial cable with the cross-section 
shown in Fig. 8, 35 and the full current in the outer conductor equal and opposite to that (/) in the inner 
conductor. 


Fig. 5.8. The cross-section of a coaxial cable. 


Let us assume that the current is uniformly distributed over the cross-sections of both 
conductors. (As we know from the previous chapter, this is indeed the case if both the internal and 
external conductors are made of a uniform resistive material.) First, we should calculate the radial 
distribution of the magnetic field — which has only one, azimuthal component because of the axial 
symmetry of the problem. This distribution may be immediately found applying the Ampére law (37) to 
circular contours of radii p within four different ranges: 


Ode. for p <a, 
i fora<p<b, 
270 B= My] vicrci = UI x 5.77 
pP Ho piercing the contour Ho ( De p° Vie? _p? ), for b . p 26, ( ) 
0, forc < /. 
Now, an easy integration yields the magnetic energy per unit length of the cable: 
U 1 7.’ r|4 2 by 2 ef 62 _ p? 2 
bie ds 4m |\a Ee P(E =D") 
(5.78) 
2 2: 2 
= Mo) ,? 4 © A cot ; 
Qn} a c?—-b’\c?-b? Bb 2)})2 
From here, and Eq. (72), we get the final answer: 
D. . fh s.B c Cc ca | 
— = —| In—+ ——— In—-—]}. 3.79 
l ss] GC ab er ah be 2 Of) 
Note that for the particular case of a thin outer conductor, c — b << b, this expression reduces to 
c+ Holinds), (5.80) 
L224 


where the first term in the parentheses is due to the contribution of the magnetic field energy in the free 
space between the conductors. This distinction is important for some applications because in 


35 As a reminder, the mutual capacitance C between the conductors of such a system was calculated in Sec. 2.3. 
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superconductor cables, as well as the normal-metal cables as high frequencies (to be discussed in the 
next chapter), the field does not penetrate the conductor’s bulk, so that Eq. (80) is valid without the last 
term '% in the parentheses — for any b <c. 


As the last example, let us calculate the mutual inductance between a long straight wire and a 
round wire loop adjacent to it (Fig. 9), neglecting the thickness of both wires. 


Fig. 5.9. An example of the 
mutual inductance calculation. 


Here there is no problem with using the last approach discussed above, based on the direct 
calculation of the magnetic flux. Indeed, as was discussed in Sec. 1, the field B; induced by the current 
I, at any point of the round loop is normal to its plane — e.g., to the plane of drawing of Fig. 9. In the 
Cartesian coordinates shown in that figure, Eq. (20) reads B; = tol\/2 my, giving the following magnetic 
flux through the loop: 


dy boli fy Re =a) _ Hoh RF, i+(-2)" | we 
=x}? 7 a [Poet i 1 Jno ¢- (5.81) 


This is a table integral equal to 7,36 so that ©; = fol, R, and the final answer for the mutual inductance 
M = Lj2 = Lp; = D2)/T, is finite (and very simple): 


M=uR, (5.82) 


despite the magnetic field's divergence at the lowest point of the loop (y = 0). 


Note that in contrast with the finite mutual inductance of this system, the se/f-inductances of both 
its wires are formally infinite in the thin-wire limit — see, e.g., Eq. (80), which in the limit b/a >> 1 
describes a thin straight wire. However, since this divergence is very weak (logarithmic), it is quenched 
by any deviation from this perfectly axial geometry. For example, a fair estimate of the inductance of a 
wire of a large but finite length / >> a may be obtained from Eq. (80) by the replacement of b with /: 


i eae (5.83) 


2n a 
(Note, however, that the exact result depends on where from/to the current flows beyond that segment.) 
It turns out that a similar approximate result, with / replaced with 27R in the front factor, and with R 
under the logarithm, is valid for the self-inductance of a round loop with a << R. (A proof of this fact is 
a very useful exercise, highly recommended to the reader.) 


36 See, e.g., MA Eq. (6.13), with a = 1. 
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5.4. Magnetic dipole moment, and magnetic dipole media 


The most natural way of the magnetic media description parallels that for dielectrics in Chapter 
3, and is based on the properties of magnetic dipoles — the notion close (but not identical!) to that of the 
electric dipoles discussed in Sec. 3.1. To introduce this notion quantitatively, let us consider, just as in 
Sec. 3.1, a spatially-localized system with a current distribution j(r’), whose magnetic field is measured 
at relatively large distances r >> r’ (Fig. 10). 


Fig. 5.10. Calculating the magnetic field of localized 
currents at a distant point (7 >> a). 


Applying the truncated Taylor expansion (3.5) of the fraction 1/|r — r’| to the vector potential 
given by Eq. (28), we get 


Ac = 44] - pfieoe r Seer iene (5.84) 


Now, due to the vector character of this potential, we have to depart somewhat from the approach of 
Sec. 3.1 and use the following vector algebra identity:37 


[/G-Vg)+ gG-VA]d*r =0, (5.85) 


that is valid for any pair of smooth (differentiable) scalar functions f(r) and g(r), and any vector function 
jdr) that, as the dc current density, satisfies the continuity condition V-j = 0 and whose normal 
component vanishes on the surface of the volume V. First, let us use Eq. (85) with f equal to 1, and g 
equal to any Cartesian component of the radius-vector r: g = 7; (/ = 1, 2, 3). Then it yields 


[G-m)d?*r = | jd?r =0, (5.86) 
V V 
so that for the vector as the whole 


Jife)a*r =0, (5.87) 


showing that the first term on the right-hand side of Eq. (84) equals zero. Next, let us use Eq. (85) again, 
but now with f= 7), g =ry (where /, /’= 1, 2, 3); then it yields 


[Gant ya )ar =O, (5.88) 


V 


so that the /" Cartesian component of the second integral in Eq. (84) may be transformed as 


37 See, e.g., MA Eq. (12.3) with the additional condition /,,|s = 0, pertinent to space-restricted currents. 


Chapter 5 Page 21 of 42 


Essential Graduate Physics EM: Classical Electrodynamics 


PY. , : 4 * ’ 1 : ’ = , = ’ 
f@-r)ja’r =| ryt jar =n phi try G a'r 
V v 


I'=] =| yp 


(5.89) 
1 2 fos roe 34 1 ’ ° 300 
= | Ord Mr dv =. rx f(r xjd'r'|. 
2a % 2 : : 
As aresult, Eq. (84) may be rewritten as 
(5.90) 
Magnetic 
where the vector m, defined as?8 dipole ana 
its potential 
(5.91) 
is called the magnetic dipole moment of a field source — that itself, within the long-rang approximation 
(90), is called the magnetic dipole. 
Note a close analogy between the m defined by Eq. (91), and the orbital3? angular momentum of 
a non-relativistic particle with mass m,: 
L, 21, XP, =0, XmV,, (5.92) 
where px, = m,V, 1s its linear momentum. Indeed, for a continuum of such particles with equal electric 
charges q, distributed with spatial density n, we have j = gnv, and Eq. (91) yields 
m= [orxjd’r= [rx var, (5.93) 
a) 2 
while the total angular momentum of such a system of particles of equal masses mo, is 
L = | nmr xvd'r> 
V 
so that we get a very straightforward relation 
m=-—/-L. (5.95)  mvs.b 
2M, 
For the orbital motion, this classical relation survives in quantum mechanics for linear operators, 
and hence for eigenvalues of the observables. Since the orbital angular momentum is quantized in the 
units of the Planck constant fi, the orbital magnetic moment of an electron is always a multiple of the so- 
called Bohr magneton ; 
= Boh 
Mp = Im? (5.96) a ae 


where m, is the free electron mass.*° However, for particles with spin, such a universal relation between 
the vectors m and L is no longer valid. For example, the electron’s spin s = 2 gives a contribution of fi/2 
to its mechanical angular momentum, but a contribution very close to “vg to its magnetic moment. 


38 In the Gaussian units, the definition (91) is kept valid “‘as is”, so that Eq. (90) is stripped of the factor 4/4. 

39 This adjective is used, especially in quantum mechanics, to distinguish the motion of a particle as a whole (not 
necessarily along a closed orbit!) from its intrinsic angular momentum, the spin — see, e.g., QM Chapters 3-6. 

40 In the SI units, m. ¥ 0.91x10°° kg, so that “up ¥ 0.93x107° J/T. 
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The next important example of a magnetic dipole is a planar thin-wire loop, limiting area A (of 
arbitrary shape), and carrying current /, for which m has a surprisingly simple form, 


m=/A, (5.97) 


where the modulus of the vector A equals the loop’s area A, and its direction is normal to the loop’s 
plane. This formula may be readily proved by noticing that if we select the coordinate frame origin on 
the plane of the loop (Fig. 11), then the elementary component of the magnitude of the integral (91), 


prxJdr 


ie” 
2 Cc 


= If rr 


= f= 77d, (5.98) 
) 


is just the elementary area dA = (1/2)rd(rg) =r°dq/2 — the equality already used in CM Eq. (3.40). 


Fig. 5.11. Calculating the 
magnetic dipole moment 
of a planar current loop. 


The comparison of Eqs. (96) and (97) allows a useful estimate of the scale of atomic currents, by 
finding what current J should flow in a circular loop of the atomic size scale (the Bohr radius) rg ~ 
0.5x10"'° m, i.e. of an area A ~ 10°” m’, to produce a magnetic moment of the order of /ip.4! The result 
is surprisingly macroscopic: J ~ 1 mA — quite comparable to the currents driving the sound in your 
phone’s earbuds. Though due to the quantum-mechanical spread of electron's wavefunctions, this 
estimate should not be taken too literally, it is very useful for getting a gut feeling of how significant the 
atomic magnetism is, and hence why ferromagnets may provide such strong magnetic fields. 


After these illustrations, let us return to the discussion of the general Eq. (90). Plugging it into 
(also general) Eq. (27), we may calculate the magnetic field of a magnetic dipole: +? 


41 Another way to arrive at the same estimate is to take J ~ ef = ea/2z with w ~ 10'° s" being the typical 
frequency of radiation due to atomic interlevel quantum transitions. 

42 Similarly to the situation with the electric dipoles (see Eq. (3.24) and its discussion), it may be shown that the 
magnetic field of any closed current loop (or any system of such loops) satisfies the following equality: 


[B@d*r =(2/3)uom, 


where the integral is over any sphere confining all the currents. On the other hand, as we know from Sec. 3.1, for 
a field with the structure (99), derived from the long-range approximation (90), such an integral vanishes. As a 
result, to get a coarse-grain description of the magnetic field of a small system located at r = 0, that would give the 
correct average value of the magnetic field, Eq. (99) should be modified as follows: 


B.() = Pemen as mate) 
1 3 


5 
r 


in a conceptual (though not quantitative) similarity to Eq. (3.25). 
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-m)—mr? Magneti 
B(r) = My 3r(r » mr (5.99) scat 
4a r field 
The structure of this formula exactly replicates that of Eq. (3.13) for the electric dipole field — including 
the sign). Because of this similarity, the energy of a dipole of a fixed magnitude m in an external field, 
and hence the torque and the force exerted on it by a fixed external field, are given by expressions fully 
similar to those for an electric dipole — see Eqs. (3.15)-(3.19):# , 
Magnetic 
dipol 
U=-m.- B. > (5.100) il 
field 
and as a result, 
T=mxB.,,,, (5.101) 
F = Vim -B.,,,). (5.102) 


Now let us consider a system of many magnetic dipoles (e.g., atoms or molecules), distributed in 
space with an atomic-scale-averaged density n. Then we can use Eq. (90) generalized in an evident way 
for an arbitrary position r’ of the dipole, and the linear superposition principle, to calculate the 
macroscopic vector potential A: 


A(r) = “2 _—— aE) gee. (5.103) 


4n |r _ rl’ 
where M = nm is the magnetization: the average magnetic moment per unit volume. Transforming this 
integral absolutely similarly to how Eq. (3.27) had been transformed into Eq. (3.29), we get: 
v'x M(r ) d’r' . 

|r-r’ 


A(r) = n | (5.104) 


5 


Comparing this result with Eq. (28), we see that VxM is equivalent, in its magnetic effect, to the 
density jer of a certain effective “magnetization current”. Just as the electric-polarization charge Per 
discussed in Sec. 3.2 (see Fig. 3.4), the vector j.¢ = VxM may be interpreted as the uncompensated part 
of the loop currents representing single magnetic dipoles m — see Fig. 12. Note, however, that since the 
atomic magnetic dipoles may be due to particles’ spins, rather than the actual electric currents due to the 
orbital motion, the magnetization current’s nature is not as direct as that of the polarization charge. 


Fig. 5.12. A cartoon illustrating the physical nature of 
the effective magnetization current j.¢= VxM. 


43 Note that the fixation of m and B.x: effectively means that the currents producing them are fixed — please have 
one more look at Eqs. (35) and (97). As a result, Eq. (100) is a particular case of Eq. (53) rather than (54) — hence 
the minus sign. 
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Now, using Eq. (28) to add the possible contribution from the stand-alone currents j not 
included in the currents of microscopic magnetic dipoles, we get the general expression for the vector 
potential of the macroscopic field: 

(r')+V'x M(r’)| 
|r-r’ 


A(r) = | i d?r', (5.105) 


Repeating the calculations that have led us from Eq. (28) to the Maxwell equation (35), with the account 
of the magnetization current term, for the macroscopic magnetic field B we get 


VxB=4,(j+VxM). (5.106) 


Following the same reasoning as in Sec. 3.2, we may recast this equation as 


VxH=j, (5.107) 
where the field defined as 
H=2_M., (5.108) 
Ho 


for historic reasons (and very unfortunately) is also called the magnetic field.“ This is why it is crucial 
to remember that the physical sense of field H is very much different from field B. To understand this 
difference better, let us use Eq. (107) to bring Eqs. (3.32), (3.36), (29), and (107) together, writing them 
as the following system of macroscopic Maxwell equations (again, so far for the stationary case 0/Ot = 
0):45 


(5.109) 


These equations clearly show that the roles of the vector fields D and H are very similar: they both may 
be called “would-be fields” — meaning the fields that would be induced by the stand-alone charges p and 
currents j, if the medium had not modified them by its dielectric and magnetic polarization. 


Despite this similarity, let me note an important difference of signs in the relation (3.33) between 
E, D, and P, on one hand, and the relation (108) between B, H, and M, on the other hand. This is not 
just a matter of definition. Indeed, due to the similarity of Eqs. (3.15) and (100), including similar signs, 
the electric and magnetic fields both try to orient the corresponding dipole moments along the field. 
Hence, in the media that allow such an orientation (and as we will see momentarily, for magnetic media 
it is not always the case), the induced polarizations P and M are directed along, respectively, the vectors 
E and B of the genuine (though macroscopic, i.e. atomic-scale-averaged) fields. According to Eq. (3.33), 
if the would-be field D is fixed — say, by a fixed stand-alone charge distribution p(r) — such a 
polarization reduces the electric field E = (D — P)/g. On the other hand, Eq. (108) shows that in a 
magnetic media with a fixed would-be field H, the magnetic polarization making M parallel to B, 


44 This confusion is exacerbated by the fact that in Gaussian units, Eq. (108) has the form H = B — 47M, and 
hence the fields B and H have the same dimensionality (and are formally equal in free space!) — though the unit of 
H has a different name (oersted, abbreviated as Oe). Mercifully, in the SI units, the dimensionality of B and H is 
different, with the unit of H called the ampere per meter. 

45 Let me remind the reader once again that in contrast with the system (36) of the Maxwell equations for the 
genuine (microscopic) fields, the right-hand sides of Eqs. (109) represent only the stand-alone charges and 
currents, not included in the microscopic electric and magnetic dipoles. 
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enhances the magnetic field B = 4(H + M). This difference may be traced back to the sign difference in 
the basic relations (1) and (2), i.e. to the fundamental fact that the electric charges of the same sign 
repulse, while the currents of the same direction attract each other. 


5.5. Magnetic materials 


In order to form a complete system, sufficient for the calculation of all fields from given p(r) and 
j(r), the macroscopic Maxwell equations (109) have to be complemented with the constitutive relations 
describing the medium: D ~ E, j © E, and B © H. The first two of them were discussed, in brief, in 
the last two chapters; let us proceed to the last one. 


A major difference between the dielectric and magnetic constitutive relations D(E) and B(H) is 
that while a dielectric medium always reduces the external field, magnetic media may either reduce or 
enhance it. To quantify this fact, let us consider the most common case — linear magnetic materials in 
that M (and hence H) is proportional to B. For isotropic materials, this proportionality is characterized 
by a scalar — either the magnetic permeability uu defined by the following relation: 


s.n0 


M=y,H. (5.111) 


or the magnetic susceptibility** defined as 


Plugging these relations into Eq. (108), we see that these two parameters are not independent, but are 


related as 
M=(1+ 7%, )Méo- (5.112) 


Note that despite the superficial similarity between Eqs. (110)-(112) and the corresponding 
relations (3.43)-(3.47) for linear dielectrics: 


D=éE, P= y.€,E, e=(l+y,)E; (5.113) 


there is an important conceptual difference between them. Namely, while the vector E on the right-hand 
sides of Eqs. (113) is the actual (though macroscopic) electric field, the vector H on the right-hand side 
of Eqs. (110)-(111) represents a “would-be” magnetic field, in most aspects similar to D rather than E — 
see, for example, Eqs. (109). This historic difference in the traditional form of the constitutive relations 
for the electric and magnetic fields is not without its physical reasons. Most experiments with electric 
and magnetic materials are performed by placing their samples into nearly-uniform electric and 
magnetic fields, and the simplest systems for their implementation are, respectively, plane capacitors 
(Fig. 2.3) and long solenoids (Fig. 6). The field in the former system may be most conveniently 


46 According to Eqs. (110) and (112), i-e. in the SI units, v,,is dimensionless, while has the same dimensionality 
as 4. In the Gaussian units, 421s dimensionless: (£2)caussian = ()si/o, and Ym 1s also introduced differently, as = 1 
+ 477m, Hence, just as for the electric susceptibilities, these dimensionless coefficients are different in the two 
systems: (%m )st = 42(Ym)Gaussian. Note also that y,, is formally called the volumic magnetic susceptibility, to 
distinguish it from the atomic (or “molecular’’) susceptibility y defined by a similar relation, (m) = yH, where m 
is the induced magnetic moment of a single dipole — e.g., an atom. (y is an analog of the electric atomic 
polarizability a — see Eq. (3.48) and its discussion.) In a dilute medium, i.e. in the absence of substantial dipole- 
dipole interactions, 7m= ny, where n is the dipole density. 
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controlled by fixing the voltage V between its plates, which is proportional to the electric field E. On the 
other hand, the field provided by the solenoid may be fixed by the current / in it, and according to Eq. 
(107), the field proportional to this stand-alone current is H, rather than B.47 


Table 1 lists the approximate magnetic susceptibility values for several materials. It shows that 
in contrast to linear dielectrics whose susceptibility y. is always positive, 1.e. the dielectric constant «= 
v- + 1 is always larger than 1 (see Table 3.1), linear magnetic materials may be either paramagnets 
(with 7 > 0,1. e. > Mo) or diamagnets (with 7m <0, i.e. < Lo). 


Table 5.1. Susceptibility (7m)s1 of a few representative and/or important magnetic materials 


“Mu-metal” (75% Ni + 15% Fe +a few %% of Cu and Mo) ~20,000° 
Permalloy (80% Ni + 20% Fe) ~8,000° 
“Electrical” (or “transformer’’) steel (Fe + a few %% of Si) ~4,000° 
Nickel ~100 
Aluminum +2x10° 
Oxygen (at ambient conditions) +0.2x10° 
Water ~0.9x10° 
Diamond ~2x10° 
Copper ~7x10° 
Bismuth (the strongest non-superconducting diamagnet) —~17x10° 


The table does not include bulk superconductors, which may be described, in a so-called 
coarse-scale approximation, as perfect diamagnets (with B = 0, i.e. formally with 7, =—1 and w= 0), 
though the actual physics of this phenomenon is different — see Sec. 6.3 below. 

) The exact values of 7m >> 1 for soft ferromagnetic materials (see, e.g., the upper three rows 
of the table) depend not only on their composition but also on their thermal processing (“annealing’’). 
Moreover, due to unintentional vibrations, the extremely high values of 7 of such materials may 
decay with time, though they may be restored to the original values by new annealing. The reason for 
such behavior is discussed in the text below. 


The reason for this difference is that in dielectrics, two different polarization mechanisms 
(schematically illustrated by Fig. 3.7) lead to the same sign of the average polarization — see the 
discussion in Sec. 3.3. One of these mechanisms, illustrated by Fig. 3.7b, i.e. the ordering of 
spontaneous dipoles by the applied field, is also possible for magnetization — for the atoms and 
molecules with spontaneous internal magnetic dipoles of magnitude mo ~ wp, due to their net spins. 
Again, in the absence of an external magnetic field the spins, and hence the dipole moments mop may be 
disordered, but according to Eq. (100), the external magnetic field tends to align the dipoles along its 
direction. As a result, the average direction of the spontaneous elementary moments mo, and hence the 
direction of the arising magnetization M, is the same as that of the microscopic field B at the points of 
the dipole location (1.e., for a diluted media, of H ~ B/z), resulting in a positive susceptibility Ym, 1.e. in 
the paramagnetism, such as that of oxygen and aluminum — see Table 1. 


47 This fact also explains the misleading term “magnetic field” for H. 
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However, in contrast to the electric polarization of atoms/molecules with no spontaneous electric 
dipoles, which gives the same sign of y.= «— 1 (see Fig. 3.7a and its discussion), the magnetic materials 
with no spontaneous atomic magnetic dipole moments have 7m < 0 — the effect called the orbital (or 
“Larmor’’’) diamagnetism. As the simplest model of this effect, let us consider the orbital motion of an 
atomic electron about an atomic nucleus as that of a classical particle of mass mo, with an electric charge 
q, about an immobile attracting center. As classical mechanics tells us, the central attractive force does 
not change the particle’s angular momentum L = morxv, but the applied magnetic field B (that may be 
taken uniform on the atomic scale) does, due to the torque (101) it exerts on the magnetic moment (95): 

Lee me as ee (5.114) 
dt 2m, 

The diagram in Fig. 13 shows that in the limit of a relatively weak field, when the magnitude of 
the angular momentum L may be considered constant, this equation describes the rotation (called the 
torque-induced precession*’) of the vector L about the direction of the vector B, with the angular 
frequency Q = —qgB/2mpo, independent of the angle @ According to Eqs. (91) and (114), the resulting 
additional (field-induced) magnetic moment Am « gQ « —q°B/my has, irrespectively of the sign of g, a 
direction opposite to the field. Hence, according to Eq. (111) with H ~ B/za, the susceptibility ym «< y= 
Am/H is indeed negative. (Let me leave its quantitative estimate within this classical model for the 
reader’s exercise.) The quantum-mechanical treatment confirms this qualitative picture of the Larmor 
diamagnetism, giving only quantitative corrections to the classical result for 7m.>° 


Fig. 5.13. The torque-induced precession of a 
classical charged particle in a magnetic field. 


A simple estimate (also left for the reader’s exercise) shows that in atoms with spontaneous non- 
zero net spins, the magnetic dipole orientation mechanism prevails over the orbital diamagnetism, so 
that the materials incorporating such atoms usually exhibit net paramagnetism — see Table 1. Due to 
possible strong quantum interaction between the spin dipole moments, the magnetism of such materials 
is rather complex, with numerous interesting phenomena and elaborate theories. Unfortunately, all this 
physics is well outside the framework of this course, and I have to refer the interested reader to special 
literature,>! but still will mention some key facts. 


48 Named after Sir Joseph Larmor who was the first (in 1897) to describe this effect mathematically. 

49 For a detailed discussion of this effect see, e.g., CM Sec. 4.5. 

50 See, e.g., QM Sec. 6.4. Quantum mechanics also explains why in most common (s-) ground states, the average 
contribution (95) of the orbital angular momentum L to the net vector m vanishes. 

51 See, e.g., D. J. Jiles, Introduction to Magnetism and Magnetic Materials, 2" ed., CRC Press, 1998, or R. C. 
O’Handley, Modern Magnetic Materials, Wiley, 1999. 
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Most importantly, a sufficiently strong magnetic dipole-dipole interaction may lead to their 
spontaneous ordering, even in the absence of the applied field. This ordering may correspond to either 
parallel alignment of the dipoles (ferromagnetism) or anti-parallel alignment of the adjacent dipoles 
(antiferromagnetism). Evidently, the external effects of ferromagnetism are stronger, because this phase 
corresponds to a substantial spontaneous magnetization M even in the absence of an external magnetic 
field. (The corresponding magnitude of B = 4M is called the remanence field, Br.) The direction of the 
vector Br may be switched by the application of an external magnetic field, with a magnitude above a 
certain value Hc called coercivity, leading to the well-known hysteretic loops on the [H, B] plane (see 
Fig. 14 for a typical example) — similar to those in ferroelectrics, already discussed in Sec. 3.3. 


1.8 
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Just as the ferroelectrics, the ferromagnets may also be hard or soft — in the magnetic rather than 
mechanical sense. In hard ferromagnets (also called permanent magnets), the dipole interaction is so 
strong that B stays close to Br in all applied fields below Hc, so that the hysteretic loops are virtually 
rectangular. Hence, in lower fields, the magnetization M of a permanent magnet may be considered 
constant, with the magnitude Br/sw. Such hard ferromagnetic materials (notably, rare-earth compounds 
such as SmCos, Sm2Coj7, and especially Nd Fe,4B), with high remanence fields (~1 T) and high 
coercivity (~10° A/m), have numerous practical applications.52 Let me give just two, most important 
examples. 


First, permanent magnets are the core components of most electric motors. By the way, this 
venerable (~150-years-old) technology is currently experiencing a quiet revolution, driven mostly by the 
electric car development. In the most advanced type of motors, called permanent-magnet synchronous 
machines (PMSM), the remanence magnetic field Br of a permanent-magnet rotating part of the 
machine (called the rotor) interacts with the magnetic field of ac currents passed through wire windings 
in the external, static part of the motor (called the stator). The resulting torque may drive the rotor to 
extremely high speeds, exceeding 10,000 rotations per minute, enabling the motor to deliver several 
kilowatts of mechanical power from each kilogram of its mass. 


As the second important example, despite the decades of the exponential (Moore’s-law) progress 
of semiconductor electronics, most computer data storage systems (e.g., in data centers) are still based 


52 Currently, the neodymium-iron-boron compound holds nearly 95% percent of the world permanent-magnet 
application market, due to its combination of high Bp and Hc with lower fabrication costs. 
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on hard disk drives whose active media are submicron-thin layers of hard ferromagnets, with the data 
bits stored in the form of the direction of the remanent magnetization of small film spots. This 
technology has reached fantastic sophistication, with the recorded data density of the order of 10’ bits 
per square inch.>3 (Only recently it started to be seriously challenged by solid-state drives based on the 
floating-gate semiconductor memories already mentioned in Chapter 3.) *4 


In contrast, in soft ferromagnets, with their lower magnetic dipole interactions, the magnetization 
is constant only inside each of the spontaneously formed magnetic domains, while the volume and shape 
of the domains are affected by the applied magnetic field. As a result, the hysteresis loop’s shape of soft 
ferromagnets is dependent on the cycled field’s amplitude and cycling history — see Fig. 14. At high 
fields, their B (and hence M) is driven into saturation, with B ~ Br, but at low fields, they behave 
essentially as linear magnetics with very high values of 7m and hence yu — see the top rows of Table 1. 
(The magnetic domain interaction, and hence the low-field susceptibility of such soft ferromagnets are 
highly dependent on the material’s fabrication technology and its post-fabrication thermal and 
mechanical treatments.) Due to these high values of y, soft ferromagnets, especially iron and its alloys 
(e.g., various special steels), are extensively used in electrical engineering — for example in the cores of 
transformers — see the next section. 


Due to the relative weakness of the magnetic dipole interaction in some materials, their 
ferromagnetic ordering may be destroyed by thermal fluctuations, if the temperature is increased above 
some value called the Curie temperature Tc, specific for each material. The transition between the 
ferromagnetic and paramagnetic phases at J = 7c is a classical example of a continuous phase 
transition, with the average polarization M playing the role of the so-called order parameter that (in the 
absence of external fields) becomes different from zero only at T < 7c, increasing gradually at the 
further temperature reduction.°> 


5.6. Systems with magnetic materials 
Just as the electrostatics of linear dielectrics, the magnetostatics is very simple in the particular 
case when all essential stand-alone currents are embedded into a linear magnetic medium with a 
constant permeability 4 Indeed, let us assume that we know the solution Bo(r) of the magnetic pair of 


53 “4 magnetic head slider [the read/write head — KKL] flying over a disk surface with a flying height of 25 nm 
with a relative speed of 20 meters/second [all realistic parameters — KKL] is equivalent to an aircraft flying at a 
physical spacing of 0.2 um at 900 kilometers/hour.” B. Bhushan, as quoted in the (generally good) book by G. 
Hadjipanayis, Magnetic Storage Systems Beyond 2000, Springer, 2001. 

54 The high-frequency properties of hard ferromagnets are also very non-trivial. For example, according to Eq. 
(101), an external magnetic field B,,, exerts torque t = MxB,,; on the spontaneous magnetic moment M of a unit 
volume of a ferromagnet. In some nearly-isotropic, mechanically fixed ferromagnetic samples, this torque causes 
the precession, around the direction of B.x: (very similar to that illustrated in Fig. 13), of not the sample as such, 
but of the magnetization M inside it, with a certain frequency q@,. If the frequency @ of an additional ac field 
becomes very close to @,, its absorption sharply increases — the so-called ferromagnetic resonance. Moreover, if @ 
is somewhat higher than @, the effective magnetic permeability 4(@) of the material for the ac field may become 
negative, enabling a series of interesting effects and practical applications. Very unfortunately, I could not find 
time for their discussion in this series and have to refer the interested reader to literature, for example the 
monograph by A. Gurevich and G. Melkov, Magnetization Oscillations and Waves, CRC Press, 1996. 

55 In this series, a quantitative discussion of such transitions is given in SM Chapter 4. 
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the genuine (“microscopic”) Maxwell equations (36) in free space, i.e. when the genuine current density 
j coincides with that of stand-alone currents. Then the macroscopic Maxwell equations (109) and the 
linear constitutive equation (110) are satisfied with the pair of functions 


n(r)= 2") Bie) = pr) = 8, (v). (5.115) 
Ho Ho 
Hence the only effect of the complete filling of a fixed-current system with a uniform, linear 
magnetic medium is the change of the magnetic field B at all points by the same constant factor 4/1 = 1 
+ Ym, Which may be either larger or smaller than 1. (As a reminder, a similar filling of a system of fixed 
stand-alone charges with a uniform, linear dielectric always leads to a reduction of the electric field E by 
a factor of 6/& = 1+ y-.—the difference whose physics was already discussed at the end of Sec. 4.) 


However, this simple result is generally invalid in the case of nonuniform (or piecewise- 
uniform) magnetic samples. To analyze this case, let us first integrate the macroscopic Maxwell 
equation (107) along a closed contour C limiting a smooth surface S. Now using the Stokes theorem just 
as at the derivation of Eq. (37), we get the macroscopic version of the Ampére law (37): 


Cc 


Let us apply this relation to a sharp boundary between two regions with different magnetic 
materials, with no stand-alone currents on the interface, similarly to how this was done for the field E in 
Sec. 3.4 — see Fig. 3.5. The result is similar as well: 


Hf, =const. (5.117) 


On the other hand, the integration of the Maxwell equation (29) over a Gaussian pillbox enclosing a 
border fragment (again just as shown in Fig. 3.5 for the field D) yields a result similar to Eq. (3.35): 


B, =const. (5.118) 
For linear magnetic media, with B = WH, the latter boundary condition is reduced to 


HH, =const. (5.119) 


Let us use these boundary conditions, first of all, to see what happens with a long cylindrical 
sample of a uniform magnetic material, placed parallel to a uniform external magnetic field Bo — see Fig. 
15. Such a sample cannot noticeably disturb the field in the free space outside it, at most of its length: 
Bext = Bo, Hext = oBext= oBo. Now applying Eq. (117) to the dominating surfaces of the sample, we get 
Hin = Ho.°° For a linear magnetic material, these relations yield Bint = “Hint = (44/0) Bo.°’ For the high- 
uu media, this means that Bin, >> Bo. This effect may be vividly represented as the concentration of the 
magnetic field lines in high-z samples — see Fig. 15 again. (The concentration affects the external field 


56 The independence of H on magnetic properties of the sample in this geometry explains why this field’s 
magnitude is commonly used as the argument in the plots like Fig. 14: such measurements are typically carried 
out by placing an elongated sample of the material under study into a long solenoid with a controllable current J, 
so that according to Eq. (116), Hy = nl, regardless of the sample. 

57 The reader is highly encouraged to carry out a similar analysis of the fields inside narrow gaps cut in a linear 
magnetic material, similar to that carried in Sec. 3.3 out for linear dielectrics — see Fig. 3.6 and its discussion. 
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distribution only at distances of the order of (4/4) t << / near the sample’s ends.) Such concentration is 
widely used in such practically important devices as transformers, in which two multi-turn coils are 
wound on a ring-shaped (e.g., toroidal, see Fig. 6b) core made of a soft ferromagnetic material (such as 
the transformer steel, see Table 1) with >> 4. This minimizes the number of “stray” field lines, and 
makes the magnetic flux ® piercing each wire turn of either coil virtually the same — the equality 
important for the secondary voltage induction — see the next chapter. 


Fig. 5. 15. Magnetic field concentration in long, high-s magnetic samples (schematically). 


Samples of other geometries may create strong perturbations of the external field, extended to 
distances of the order of the sample’s dimensions. To analyze such problems, we may benefit from a 
simple, partial differential equation for a scalar function, e.g., the Laplace equation, because in Chapter 
2 we have learned how to solve it for many simple geometries. In magnetostatics, the introduction of a 
scalar potential is generally impossible due to the vortex-like magnetic field lines. However, if there are 
no stand-alone currents within the region we are interested in, then the macroscopic Maxwell equation 
(107) for the field H is reduced to V x H = 0, similar to Eq. (1.28) for the electric field, showing that we 
may introduce the scalar potential of the magnetic field, ¢n, using a relation similar to Eq. (1.33): 

H=-V¢ 


m°* 


(5.120) 


Combining it with the homogenous Maxwell equation (29) for the magnetic field, V-B = 0, and Eq. 
(110) for a linear magnetic material, we arrive at a single differential equation, V-(wV¢n) =0. For a 
uniform medium (zr) = const), it is reduced to our beloved Laplace equation: 


V'¢,, =0. (5.121) 
Moreover, Eqs. (117) and (119) give us very familiar boundary conditions: the first of them 
OO sna (5.122a) 
OT 
being equivalent to 
$j, = const, (5.122b) 
while the second one giving 
0 
geile (5.123) 
On 


Indeed, these boundary conditions are absolutely similar for (3.37) and (3.56) of electrostatics, with the 
replacement ¢—> w.°8 


58 This similarity may seem strange because earlier we have seen that the parameter yz is physically more similar 
to 1/é. The reason for this paradox is that in magnetostatics, the magnetic potential ¢, is traditionally used to 
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Let us analyze the geometric effects on magnetization, first using the (too?) familiar structure: a 
sphere, made of a linear magnetic material, placed into a uniform external field Ho = Bo/. Since the 
differential equation and the boundary conditions are similar to those of the corresponding electrostatics 
problem (see Fig. 3.11 and its discussion), we can use the above analogy to reuse the solution we 
already have — see Eqs. (3.63). Just as in the electric case, the field outside the sphere, with 


= 3 
(¢,, Jee rs a(- oe ral COS 0, (5. 124) 
is a sum of the uniform external field Ho, with the potential —Horcos@ = —Hz, and the dipole field (99) 


with the following induced magnetic dipole moment of the sphere:5? 


Aq LHe 3 
M+ 2 My 


On the contrary, the internal field is perfectly uniform, and directed along the external one: 


H,. (5.125) 


A. B. 
=—H SH aca so that —*= 3H a 2 Mats, . (5.126) 
m/r<R 


” + 2p, Hi: eos By MoH) = pet+2My 


Note that the field Hin inside the sphere is not equal to the applied external field Ho. This 
example shows that the interpretation of H as the “would-be” magnetic field generated by external 
stand-alone currents j should not be exaggerated by saying that its distribution is independent of the 
magnetic bodies in the system. In the limit 42 >> so, Eqs. (126) yield Hin/Ho << 1, Bin/Ho = 3,40, the 
factor 3 being specific for the particular geometry of the sphere. If a sample is strongly stretched along 
the applied field, with its length / much larger than the scale ¢ of its cross-section, this geometric effect is 
gradually decreased, and Bin: tends to its value “Ho >> Bo, as was discussed above — see Fig. 15. 


Now let us calculate the field distribution in a similar, but slightly more complex (and practically 
important) system: a round cylindrical shell, made of a linear magnetic material, placed into a uniform 
external field Hp normal to its axis — see Fig. 16. 


y=psing 


H, 
Fig. 5.16. Cylindrical magnetic shield. 
——> 


describe the “would-be field” H, while in electrostatics, the potential ¢ describes the actual electric field E. (This 
tradition persists from the days when H was perceived as a genuine magnetic field.) 
5° To derive Eq. (125), we may either calculate the gradient of the ¢, given by Eq. (124), or use the similarity of 
Egs. (3.13) and (99), to derive from Eq. (3.17) a similar expression for the magnetic dipole’s potential: 
1 mcos@ 
Pr = 2 : 
4n  r 
Now comparing this formula with the second term of Eq. (124), we immediately get Eq. (125). 
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Since there are no stand-alone currents in the region of our interest, we can again represent the 
field H(r) by the gradient of the magnetic potential @n, — see Eq. (120). Inside each of three constant-y 
regions, i.e. at 0 < b, a< p< b, and b < p (where p is the 2D distance from the cylinder's axis), the 
potential obeys the Laplace equation (121). In the convenient, polar coordinates (see Fig. 16), we may, 
guided by the general solution (2.112) of the Laplace equation and our experience in its application to 
axially-symmetric geometries, look for ¢n in the following form: 

(-H,p+b'/ p)cosg, forb< p, 
¢,, = (a,p +b, / p)cosg, fora< p<b, (5.127) 
— H,,, PCOS@, foro <a. 
Plugging this solution into the boundary conditions (122)-(123) at both interfaces (0 = 6 and p 
=a), we get the following system of four equations: 
—~H,b+b//b=a,b+b,/b, (a,a+b,/a)=—H,,4, 
es 7 : (5.128) 
uy(-H, —b,/b JH, = ua, —b, 1b ) ula, —b,la )=-py Hin 
for four unknown coefficients a1, b;, b;’, and Hin. Solving the system, we get, in particular: 


2 
op | 
HS ttt gy | S| (5.129) 
H, a,—(a/b) H- My 


According to these formulas, at 42 > suo, the field in the free space inside the cylinder is lower 
than the external field. This fact allows using such structures, made of high-z materials such as 
permalloy (see Table 1), for passive shielding’ from unintentional magnetic fields (e.g., the Earth's 
field) — the task very important for the design of many physical experiments. As Eq. (129) shows, the 
larger is y, the closer is @ to 1, and the smaller is the ratio Hin/Ho, 1.e. the better is the shielding, for the 
same a/b ratio. On the other hand, for a given magnetic material, i.e. for a fixed parameter a, the 
shielding is improved by making the ratio a/b < 1 smaller, i.e. the shield thicker. On the other hand, as 
Fig. 16 shows, smaller a leaves less space for the shielded samples, calling for a compromise. 


Note that in the limit 4/1 — ©, both Eq. (126) and Eq. (129), describing different geometries 
yleld Hin/Ho — 0. Indeed, as it follows from Eq. (119), in this limit the field H tends to zero inside 
magnetic samples of virtually geometry. (The formal exception is the longitudinal cylindrical geometry 
shown in Fig. 15, with ¢// + 0, where Hin = Ho for any finite wz, but even in it, the last equality holds 
only if t/1 << puo/{2.) 


Now let us discuss a curious (and practically important) approach to systems with relatively thin, 
closed magnetic cores made of several sections of high-z magnetic materials, with the cross-section 
areas A, much smaller than the squared lengths /; of the sections — see Fig. 17. If all 44 >> “, virtually 
all field lines are confined to the interior of the core. Then, applying the macroscopic Ampére law (116) 
to a contour C that follows a magnetic field line inside the core (see, for example, the dashed line in Fig. 
17), we get the following approximate expression (exactly valid only in the limit 4/100, 1?/Ax > ©): 


60 Another approach to the undesirable magnetic fields' reduction is the "active shielding" — the external field’s 
compensation with the counter-field induced by controlled currents in specially designed wire coils. 
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B 
-=NI. (5.130) 
My 


fHdl x), H, = om? 
C k k 


However, since the magnetic field lines stay in the core, the magnetic flux OD; ~ B.A; should be the same 
(= ®) for each section, so that B, = O/A;. Plugging this condition into Eq. (130), we get 


Magnetic 
Ohm law 
and 
reluctance 


(5.131) 


Fig. 5.17. Deriving the “magnetic Ohm law” (131). 


Note a close analogy of the first of these equations with the usual Ohm law for several resistors 
connected in series, with the magnetic flux playing the role of electric current, while the product N/, the 
role of the voltage applied to the chain of resistors. This analogy is fortified by the fact that the second 
of Eqs. (131) is similar to the expression for the resistance R = //oA of a long, uniform conductor, with 
the magnetic permeability ~ playing the role of the electric conductivity o. (To sound similar, but still 
different from the resistance R, the parameter “ is called reluctance.) This is why Eq. (131) is called 
the magnetic Ohm law; it is very useful for approximate analyses of systems like ac transformers, 
magnetic energy storage systems, etc. 


Now let me proceed to a brief discussion of systems with permanent magnets. First of all, using 
the definition (108) of the field H, we may rewrite the Maxwell equation (29) for the field B as 


V-B=y,V-(H+M)=0, ie. as V-H=-V-M, (5.132) 


While this relation is general, it is especially convenient in permanent magnets, where the magnetization 
vector M may be approximately considered field-independent.®! In this case, Eq. (132) for H is an exact 
analog of Eq. (1.27) for E, with the fixed term —V-M playing the role of the fixed charge density (more 
exactly, of p/&). For the scalar potential ¢n, defined by Eq. (120), this gives the Poisson equation 


V'¢,, =V-M, (5.133) 


similar to those solved, for quite a few geometries, in the previous chapters. 


6! Note that in this approximation, there is no difference between the remanence magnetization Mp = Br/ Lu, of the 
magnet and its saturation magnetization Mg = limy_,..[B(A)/L - H]. 
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In the case when M is not only field-independent but also uniform inside a permanent magnet’s 
volume, then the right-hand sides of Eqs. (132) and (133) vanish both inside the volume and in the 
surrounding free space, and give a non-zero effective charge only on the magnet’s surface. Integrating 
Eq. (132) along a short path normal to the surface and crossing it, we get the following boundary 
conditions: 

AH, = (H, a free space ~ (H, Ne magnet = M, =M cos 0, (5. 134) 
where @ is the angle between the magnetization vector and the outer normal to the magnet’s surface. 
This relation is an exact analog of Eq. (1.24) for the normal component of the field E, with the effective 
surface charge density (or rather o/&) equal to Mcos@. 


This analogy between the magnetic field induced by a fixed, constant magnetization and the 
electric field induced by surface electric charges enables one to reuse the solutions of quite a few 
problems considered in Chapters 1-3. Leaving a few such problems for the reader's exercise (see Sec. 7), 
let me demonstrate the power of this analogy on just two examples specific to magnetic systems. First, 
let us calculate the force necessary to detach the flat ends of two long, uniform rod magnets, of length / 
and cross-section area A << 7’, with the saturated remanent magnetization Mo directed along their 
length — see Fig. 18. 


Fig. 5.18. Detaching two magnets. 


Let us assume we have succeeded to detach the magnets by an infinitesimal distance r<< A'”, 1. 
Then, according to Eqs. (133)-(134), the distribution of the magnetic field near this small gap should be 
similar to that of the electric field in a system of two equal by opposite surface charges with the surface 
density o proportional to Mo. From Chapters 1-3, we know the properties of such a system very well: 
within the gap, the field is virtually constant, uniform, proportional to o, and independent of z. For its 
magnitude in the magnetic case, Eq. (134) gives simply H = Mo, and hence B = Mp. (Just outside of the 
gap, the field is very low, because due to the condition A << /’, the effect of the similar effective charges 
at the "outer" ends of the rods on the field near the gap ¢ is negligible.) 


From here, we can readily calculate Finin as the force exerted by this field on the effective surface 
"charges". However, it is even easier to find it from the following energy argument. Since the magnetic 
field energy localized inside the magnets and near their outer ends cannot depend on z, this small 
detachment may only alter the energy inside the gap. For this part of the energy, Eq. (57) yields: 

B M,) 
au = By = {Hood gy (5.135) 
2 Lo 2 Lp 
The gradient of this potential energy is equal to the attraction force F = —V(AU), trying to reduce AU by 
decreasing the gap, with the following magnitude: 


(AU) _ HyM yA 
OT a, * 


F|=" (5.136) 
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The magnet detachment requires an equal and opposite external force. For a typical permanent magnet, 
with sMo ~ Br ~ IT, the force corresponds to a ratio |F\/A close to 4x10° Pa, a few times the normal 
atmospheric pressure. 


Now let us consider the situation when similar long permanent magnets (such as the magnetic 
needles used in magnetic compasses) are separated, in otherwise free space, by a larger distance d >> 
A‘? — see Fig. 19. For each needle (Fig. 19a), of a length / >> A’, the right-hand side of Eq. (133) is 
substantially different from zero only in two relatively small areas at the needle’s ends. Integrating the 
equation over each area, we see that at distances r >> A” from each end, we may reduce Eq. (132) to 


V-H=g,6(r-r,)-9,0("-r_), (5.137) 


where r. are ends’ positions, and gm = Mo4, with A being the needle’s cross-section area. This equation 
is completely similar to Eq. (3.32) for the electric displacement D, for the particular case of two equal 
and opposite point charges, i.e. with p = gd(r — r+) — g(r — r+), with the only replacement gq > dm. 
Since we know the resulting electric field all too well (see, e.g., Eq. (1.7) for E = D/&), we may 
immediately write a similar expression for the field H: 


()=2 Of ene ree (5.138) 
1 Jr-r 


(b) 


— On 


Fig. 5.19. (a) “Magnetic charges” at the ends of a thin permanent-magnet needle and (b) the result of its 
breaking into two parts (schematically). 


The resulting magnetic field B(r) = 4oH(r) exerts on another “magnetic charge” qm, located at 
some point r’, the force F = q’mB(r’).©2 Hence if two ends of different needles are separated by an 
intermediate distance R (A'” << R << J, see Fig. 19b), we may neglect one term in Eq. (138), and get 
the following “magnetic Coulomb law” for the interaction of the nearest ends: 

Ho 


BPS 2 Gin9 in 


a 5.139 
An R? ( ) 


The “only” (but conceptually, crucial!) difference between this interaction and that of the electric point 


charges is that the two “magnetic charges” (quasi-monopoles) of a magnetic needle cannot be fully 


62 The simplest way to verify this (perhaps, obvious) expression is to check that for a system of two “charges” 
+q ’m, Separated by vector a, placed into a uniform external magnetic field B,x,, it yields the potential energy (100) 
with the correct magnetic dipole moment m = g,,a — cf. Eq. (3.9) for an electric dipole. 
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separated. For example, if we break a needle in the middle in an attempt to bring its two ends further 
apart, two new “point charges” appear — see Fig. 19b. 


There are several solid-state systems where more flexible structures, similar in their 
magnetostatics to the needles, may be implemented. First of all, certain (“type-II”) superconductors may 
carry so-called Abrikosov vortices — flexible tubes with field-suppressed superconductivity inside, each 
carrying one quantum ®» = zfi/e ~ 2x10" Wb of the magnetic flux. Ending on superconductor’s 
surfaces, these tubes let their magnetic field lines spread into the surrounding free space, essentially 
forming magnetic monopole analogs — of course, with equal and opposite “magnetic charges” gm on 
each end of the tube — just as Fig. 19a shows. Such flux tubes are not only flexible but also stretchable, 
resulting in several peculiar effects — see Sec. 6.4 for more detail. Another recently found example of 
such paired quasi-monopoles is spin chains in the so-called spin ices — crystals with paramagnetic ions 
arranged into a specific (pyrochlore) lattice — such as dysprosium titanate Dy2Ti207.® Let me emphasize 
again that any reference to magnetic monopoles in such systems should not be taken literally. 


In order to complete this section (and this chapter), let me briefly discuss the magnetic field 
energy U, for the simplest case of systems with linear magnetic materials. In this case, we still may use 
Eq. (55), but if we want to operate only with macroscopic fields, and hence only stand-alone currents, 
we should repeat the manipulations that have led us to Eq. (57), using j not from Eq. (35), but from Eq. 
(107). As a result, instead of Eq. (57) we get 


U=|u(r)d*r, with u= = (5.140) 


V 


This result is evidently similar to Eq. (3.73) of electrostatics. 


As a simple but important example of its application, let us again consider a long solenoid (Fig. 
6a), but now filled with a linear magnetic material with permeability w. Using the macroscopic Ampére 
law (116), just as we used Eq. (37) for the derivation of Eq. (40), we get 


He=lIn, and hence B= sun, (5.141) 


where n = N/I, just as in Eq. (40), is the winding density, i.e. the number of wire turns per unit length. 
(At 4 = 4, we immediately return to that old result.) Now we may plug Eq. (141) into Eq. (140) to 
calculate the magnetic energy stored in the solenoid: 


2 2 
u auy HEE yy. HO 


>? (5.142) 
and then use Eq. (72) to calculate its self-inductance:® 
U 2 
L=——=n'lA 5.143 
ra ak (5.143) 


We see that ZL x wV, so filling a solenoid with a high-w material may allow making it more 
compact while preserving the same value of inductance. In addition, as the discussion of Fig. 15 has 


63 See, e.g., L. Jaubert and P. Holdworth, J. Phys. — Cond. Matt. 23, 164222 (2011), and references therein. 

64 Admittedly, we could get the same result simpler, just by arguing that since the magnetic material fills the 
whole volume of a substantial magnetic field in this system, the filling simply increases the vector B at all points, 
and hence its flux ®, and hence L = @// by the factor 4/4 in comparison with the free-space value (75). 
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shown, such filling reduces the fringe fields near the solenoid's ends, which may be detrimental for some 
applications, especially in physical experiments striving for high measurement precision. 


However, we still need to explore the issue of magnetic energy beyond Eq. (140), not only to get 
a general expression for it in materials with an arbitrary dependence B(H), but also to finally prove Eq. 
(54) and explore its relation with Eq. (53). I will do this at the beginning of the next chapter. 


5.7. Exercise problems 


5.1. DC current J flows around a thin wire loop bent into the form of a plane equilateral triangle 
with side a. Calculate the magnetic field in the center of the loop. 


5.2. A circular wire loop, carrying a fixed dc current, has been placed ; 
inside a similar but larger loop, carrying a fixed current in the same direction 
— see the figure on the right. Use semi-quantitative arguments to analyze the 
mechanical stability of the coaxial and coplanar position of the inner loop ar 


with respect to its possible angular, axial, and lateral displacements relative to 
the outer loop. 


5.3. Two straight, plane, parallel, long, thin conducting strips of width w, I 
separated by distance d, carry equal but oppositely directed currents J — see the 
figure on the right. Calculate the magnetic field in the plane located in the I 
middle between the strips, assuming that the flowing currents are uniformly 


distributed across the strip widths. 


5.4. For the system studied in the previous problem, but now only in the limit d << w, calculate: 
(1) the distribution of the magnetic field in space, 

(11) the vector potential of the field, 

(111) the magnetic force (per unit length) exerted on each strip, and 

(iv) the magnetic energy and self-inductance of the loop formed by the strips (per unit length). 


5.5. Calculate the magnetic field distribution near the center of the system of 
two similar, plane, round, coaxial wire coils, carrying equal but oppositely directed 


currents — see the figure on the right. <> 


5.6. The two-coil-system considered in the previous problem, now carries Gdl24 
equal and similarly directed currents — see the figure on the right.6> Calculate what a= 


should be the ratio d/R for the second derivative 6’B./6z’ to equal zero at z = 0. 


65 This system (called the Helmholtz coils), producing a highly uniform field near its center, is broadly used in 
physical experiment. 
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5.7. DC current of a constant density 7 flows along a round cylindrical wire of 
radius R, with a round cylindrical cavity of radius 7 cut in it. The cavity’s axis is 
parallel to that of the wire but offset from it by a distance d < R — r (see the figure on 
the right). Calculate the magnetic field inside the cavity. 


5.8. Calculate the magnetic field’s distribution along the axis of a straight 
solenoid (see Fig. 6a, partly reproduced on the right) with a finite length /, and 
round cross-section of radius R. Assume that the solenoid has many (NV >> 1, //R) 
wire turns, uniformly distributed along its length. 


5.9. A thin round disk of radius R, carrying an electric charge of a constant areal density o, 
rotates around its axis with a constant angular velocity @. Calculate: 


(1) the induced magnetic field on the disk’s axis, 
(11) the magnetic moment of the disk, 


and relate these results. 


5.10. A thin spherical shell of radius R, with charge Q uniformly distributed over its surface, 
rotates about its diameter with a constant angular velocity @. Calculate the distribution of the magnetic 
field everywhere in space. 


5.11. A sphere of radius R, made of an insulating material with a uniform electric charge density 
p, rotates about its diameter with a constant angular velocity w. Calculate the magnetic field distribution 
inside the sphere and outside it. 


5.12. The reader is hopefully familiar with the classical Hall effect in the usual rectangular Hall 
bar geometry — see the left panel of the figure below. However, the effect takes a different form in the 
so-called Corbino disk — see the right panel of the figure below. (Dark shading shows electrodes, with 
no appreciable resistance.) Analyze the effect in both geometries, assuming that in both cases, the 
conductors are thin and planar, have a constant Ohmic conductivity o and charge carrier density n, and 
that the applied magnetic field B is uniform and normal to conductors’ planes. 


5.13." The simplest version of the famous homopolar (or “unipolar”) motor is a thin, round 
conducting disk, placed into a uniform magnetic field normal to its plane, with de current passed 
between the disk’s center and a sliding electrode (“brush”) on its rim — see the figure below. 
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(i) Express the torque rotating the disk via its radius R, the magnetic 
field B, and the current J. 

(11) If the disk is allowed to rotate about its axis, and the motor is 
driven by a battery with e.m.f. Y% calculate its stationary angular velocity 
@, neglecting friction and the electric circuit’s resistance. 

(iii) Now assuming that the current circuit (battery + wires + 
contacts + disk itself) has a non-zero resistance %, derive and solve the 


equation for the time evolution of @, and analyze the solution. 


5.14. Current J flows in a thin wire bent into a plane round loop of radius R. Calculate the net 
magnetic flux through the plane in which the loop is located. 


5.15. A wire with a round cross-section of radius a has been bent into a round loop of radius R 
>> r. Prove the formula for its self-inductance, mentioned at the end of Sec. 5.3 of the lecture notes: L = 
oR In(cR/a), with c ~ 1. 


5.16. Prove that: 


(1) the self-inductance L of a current loop cannot be negative, and 
(11) any inductance coefficient Li,, defined by Eq. (60), cannot be larger than (LiLx ye. 


5.17. Calculate the mutual inductance of two similar thin-wire 
square-shaped loops, offset by distance h in the direction normal to their 
planes — see the figure on the right. Z @ 


5.18." Estimate the values of magnetic susceptibility due to 


(1) orbital diamagnetism, and 
(ii) spin paramagnetism, 
for a medium with negligible interaction between the induced molecular dipoles. Compare the results. 


Hints: For Task (i), you may use the classical model described by Eq. (114) — see Fig. 13. For 
Task (ii), assume the mechanism of ordering of spontaneous magnetic dipoles mo, with a magnitude mo 
of the order of the Bohr magneton sip, similar to the one sketched for electric dipoles in Fig. 3.7a. 


5.19.” Use the classical picture of the orbital (“Larmor”) diamagnetism, discussed in Sec. 5, to 
calculate its (small) contribution AB(0) to the magnetic field B felt by an atomic nucleus, treating the 
electrons of the atom as a spherically-symmetric cloud with an electric charge density p(r). Express the 
result via the value (0) of the electrostatic potential of the electron cloud, and use this expression for a 
crude numerical estimate of the ratio AB(0)/B for the hydrogen atom. 


5.20. Calculate the self-inductance of a toroidal solenoid 
(see Fig. 6b) with the round cross-section of radius r ~ R (see the 
figure on the right), with many (NV >> 1, R/r) wire turns uniformly 
distributed along the perimeter, and filled with a linear magnetic 
material of permeability 4. Check your results by analyzing the 
limit r << R. 
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5.21. A long straight, thin wire carrying current J, passes parallel u @] 
to the plane boundary between two uniform, linear magnetic media — see d d 
the figure on the right. Calculate the magnetic field everywhere in the 
system, and the force (per unit length) exerted on the wire. My 


5.22. Solve the magnetic shielding problem similar to that discussed in Sec. 5.6 of the lecture 
notes, but for a spherical rather than cylindrical shell, with the same central cross-section as shown in 
Fig. 16. Compare the efficiency of those two shields, for the same shell’s permeability 4, and the same 
b/a ratio. 


5.23. Calculate the magnetic field distribution around a spherical permanent magnet with a 
uniform magnetization Mo = const. 


5.24. A limited volume V is filled with a magnetic material with a fixed (field-independent) 
magnetization M(r). Write explicit expressions for the magnetic field induced by the magnetization, and 
its potential, and recast these expressions into the forms more convenient when M(r) = Mo = const 
inside the volume JV. 


5.25. Use the results of the previous problem to calculate the 
distribution of the magnetic field H along the axis of a straight 
permanent magnet of length 2/, with a round cross-section of radius 
R, and a uniform magnetization Mp parallel to the axis — see the 
figure on the right. 


5.26. A flat end of a long straight permanent magnet, similar to that considered in the previous 
problem but of an arbitrary cross-section of area A, is stuck to a flat surface of a large sample of a linear 
magnetic material with a very high permeability 44 >> 4. Calculate the normally-directed force needed 
to detach them. 


5.27. A permanent magnet with a uniform magnetization Mo has the form of a spherical shell 
with an internal radius R; and an external radius R2 > R;. Calculate the magnetic field inside the shell. 


5.28. A very broad film of thickness 2¢ is permanently magnetized normally to its plane, with a 
periodic checkerboard pattern, with the square of area axa: 


=n_M(x, y), with M(x,y)=M, x san{ cos COs ®) 
a a 


M 


|z|<¢ 


Calculate the magnetic field’s distribution in space. 


5.29.” Based on the discussion of the quadrupole electrostatic lens in Sec. 2.4, suggest the 
permanent-magnet systems that may similarly focus particles moving close to the system’s axis, for the 
cases when each particle carries: 


(i) an electric charge, 
(ii) no net electric charge, but a spontaneous magnetic dipole moment m of a certain orientation. 
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Chapter 6. Electromagnetism 


This chapter discusses two major effects that arise when electric and magnetic fields change over time: 
the “electromagnetic induction” of an additional electric field by changing magnetic field, and the 
reciprocal effect of the “displacement currents’’— actually, the induction of an additional magnetic field 
by changing electric field. These two phenomena, which make time-dependent electric and magnetic 
fields inseparable (hence the term “electromagnetism”’'), are reflected in the full system of Maxwell 
equations, valid for an arbitrary electromagnetic process. On the way toward this system, I will make a 
brief detour to review the electrodynamics of superconductivity, which (besides its own significance), 
provides a perfect platform for discussion of the important general issue of gauge invariance. 


6.1. Electromagnetic induction 


As Egs. (5.36) show, in static situations (0/0t = 0) the Maxwell equations describing the electric 
and magnetic fields are independent — more exactly, coupled only implicitly, via the continuity equation 
(4.5) relating their right-hand sides p and j. In dynamics, when the fields change in time, the situation is 
different. 


Historically, the first discovered explicit coupling between the electric and magnetic fields was 
the effect of electromagnetic induction. Although this effect was discovered independently by Joseph 
Henry, it was a brilliant series of experiments by Michael Faraday, carried out mostly in 1831, that 
resulted in the first general formulation of the induction law. The summary of Faraday’s numerous 
experiments has turned out to be very simple: if the magnetic flux defined by Eq. (5.65), 


@=[B,d’r, (6.1) 
S 


through a surface S limited by a closed contour C, changes in time by whatever reason (e.g., either due 
to a change of the magnetic field B (as in Fig.1), or the contour’s motion, or its deformation, or any 
combination of the above), it induces an additional, vortex-like electric field Eing directed along the 
contour — see Fig. 1. 


|| | [pe alee 


C 


c 
L=YV IR 


Fig. 6.1. Two simplest ways to observe the Faraday electromagnetic induction. 


The exact distribution of Ei,q in space depends on the system’s details, but its integral along the 
contour C, called the inductive electromotive force (e.m.f.), obeys a very simple Faraday induction law: 


! It was coined by H. Orsted in 1820 in the context of his experiments — see the previous chapter. 
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(6.2) 


(In the Gaussian units, the right-hand side of this formula has an additional coefficient of 1/c.) 


It is straightforward (and hence left for the reader’s exercise) to show that this e.m.f. may be 
measured, for example, either by inserting a voltmeter into a conducting loop following the contour C or 
by measuring the small current J = Ving/R it induces in a thin wire with a sufficiently large Ohmic 
resistance R,? whose shape follows that contour — see Fig. 1. (Actually, these methods are not entirely 
different, because a typical voltmeter measures voltage by the small Ohmic current it drives through the 
pre-calibrated high internal resistance of the device.) In the context of the latter approach, the minus 
sign in Eq. (2) may be described by the following Lenz rule: the magnetic field of the induced current J 
provides a partial compensation of the change of the original flux ®(¢) with time.? 


In order to recast Eq. (2) in a differential form, more convenient in many cases, let us apply to 
the contour integral in it the Stokes theorem, which was repeatedly used in Chapter 5. The result is 


Fg =[(WE, 


ind ),d°r (6.3) 
Ss 
Now combining Eqs. (1)-(3), for a contour C whose shape does not change in time (so that the 
integration along it is interchangeable with the time derivative), we get 
i(v=e B | d’r=0. (6.4) 


ind 
Ss 


Since the induced electric field is an addition to the gradient field (1.33) created by electric 
charges, for the net field we may write E = Eing— V@. However, since the curl of any gradient field is 
zero,* Vx(V@) = 0, Eq. (4) remains valid even for the net field E. Since this equation should be correct 
for any closed area S, we may conclude that 


oe 
ot 


at any point. This is the final (time-dependent) form of this Maxwell equation. Superficially, it may look 
that Eq. (5) is less general than Eq. (2); for example, it does not describe any electric field, and hence 
any e.m.f. in a moving loop, if the field B is constant in time, even if the magnetic flux (1) through the 
loop does change in time. However, this is not true; in Chapter 9 we will see that in the reference frame 
moving with the loop such e.m.f. does appear.> 


2 Such induced current is sometimes called the eddy current, though most often this term is reserved for the 
distributed currents induced by changing magnetic fields in bulk conductors — see Sec. 3 below. 

3 Let me also hope that the reader is familiar with the paradox arising at attempts to measure Vnq with a voltmeter 
without its insertion into the wire loop; if not, I would highly recommend them to solve the offered Problem 2. 

4 See, e.g., MA Eq. (11.1). 

5 | have to admit that from the beginning of the course, I was carefully sweeping under the rug a very important 
question: in what exactly reference frame(s) all the equations of electrodynamics are valid? I promise to discuss 
this issue in detail later in the course (in Chapter 9), and for now would like to get away with a very short answer: 
all the formulas discussed so far are valid in any inertial reference frame, as defined in classical mechanics — see, 
e.g., CM Sec. 1.3; however, the fields E and B have to be measured in the same reference frame. 
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Now let us reformulate Eq. (5) in terms of the vector potential A. Since the induction effect does 
not alter the fundamental relation V - B = 0, we still may represent the magnetic field as prescribed by 
Eq. (5.27), i.e. as B = V x A. Plugging this expression into Eq. (5), and changing the order of the 
temporal and spatial differentiation, we get 


vx{E+ 2) =o. (6.6) 
Ot 


Hence we can use the same argumentation as in Sec. 1.3 (there applied to the vector E alone) to 
represent the expression in the parentheses as —V 4, so that we get 

OA 

E =-—-V¢4@, B=VxA. (6.7) 
ot 
It is very tempting to interpret the first term of the right-hand side of the expression for E as the 

one describing the electromagnetic induction alone, and the second term as representing a purely 
electrostatic field induced by electric charges. However, the separation of these two terms is, to a certain 
extent, conditional. Indeed, let us consider the gauge transformation already mentioned in Sec. 5.2, 


A>A+Vy, (6.8) 


that, as we already know, does not change the magnetic field. According to Eq. (7), to keep the full 
electric field intact (gauge-invariant) as well, the scalar electric potential has to be transformed 
simultaneously, as 


p29-, (6.9) 


leaving the choice of an addition to ¢ restricted only by the Laplace equation — since the full ¢ should 
satisfy the Poisson equation (1.41) with a gauge-invariant right-hand side. We will return to the 
discussion of the gauge invariance in Sec. 4. 


6.2. Magnetic energy revisited 


Now we are sufficiently equipped to revisit the issue of magnetic energy, in particular, to finally 
prove Eqs. (5.57) and (5.140), and discuss the dichotomy of the signs in Eqs. (5.53) and (5.54). For that, 
let us consider a sufficiently slow and small magnetic field variation 6B. If we want to neglect the 
kinetic energy of the system of electric currents under consideration, as well as the wave radiation 
effects, we need to prevent its significant acceleration by the arising induction field Eing. Let us suppose 
that we do this by virtual balancing of this field by an external electric field E.x, =—Eing. According to 
Eq. (4.38), the work of that field® on the stand-alone currents of the system during a small time interval 
ot, and hence the change of the potential energy of the system, is 


6U =6t|j-E,.d’r,  sothat SU =-6t|j-E,,,d’r, (6.10) 
ext ind 
V V 


6 As a reminder, the magnetic component of the Lorentz force (5.10), vxB, is always perpendicular to the particle 
velocity v, so the magnetic field B itself cannot perform any work on moving charges, i.e. on currents. 
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where the integral is over the volume of the system. Now expressing the current density j from the 
macroscopic Maxwell equation (5.107), j = V x H, and then applying the vector algebra identity’ 


(Vx H)-E,,. =H-(VxE,3)—V -(Eing XH), (6.11) 
we get 
dU =—dt/H-(VxE)d*r + dt] V-(ExH)d’*r. (6.12) 
V V 


According to the divergence theorem, the second integral in the right-hand of this equality is 
equal to the flux of the so-called Poynting vector S = E x H through the surface limiting the considered 
volume V. Later in the course we will see that this flux represents, in particular, the power of 
electromagnetic radiation through the surface. If such radiation is negligible (as it always is if the field 
variation is sufficiently slow), the surface may be selected sufficiently far, so that the flux of S vanishes. 
In this case, we may express V x E from the Faraday induction law (5) to get 


ou =-or{[-)- a'r = fit aba'r. (6.13) 
V V 


Just as in the electrostatics (see Eqs. (1.65) and (3.73), and their discussion), this relation may be 
interpreted as the variation of the magnetic field energy U of the system, and represented in the form 


(6.14) 


This is a keystone result; let us discuss it in some detail. 


First of all, for a system filled with a linear and isotropic magnetic material, we may use Eq. (14) 
together with Eq. (5.110): B = wH. Integrating the result over the variation of the field from 0 to a 
certain final value B, we get Eq. (5.140) — so important one that it deserves rewriting it again: 

RB? 
U = [ulr)a*r, with u=—. (6.15) 
y 2u 
In the simplest case of free space (no magnetics at all, so that j above is the complete current density), 
we may take “= fu, and reduce Eq. (15) to Eq. (5.57). Now performing backward the transformations 
that took us, in Sec. 5.3, to derive that relation from Eq. (5.54), we finally have the latter formula proved 
— as was promised in the last chapter. 


It is very important, however, to understand the limitations of Eq. (15). For example, let us try to 
apply it to a very simple problem, which was already analyzed in Sec. 5.6 (see Fig. 5.15): a very long 
cylindrical sample of a linear magnetic material placed into a fixed external field Hext parallel to the 
sample’s length. It is evident that in this simple geometry, the field H and hence the field B = H have 
to be uniform inside the sample, besides negligible regions near its ends, so that Eq. (15) is reduced to 


U=—y, (6.16) 


7 See, e.g., MA Eq. (11.7) with f = Ejng and g = H. 
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where V = A/is the cylinder’s volume. Now if we try to calculate the static (equilibrium) value of the 
field from the minimum of this potential energy, we get evident nonsense: B = 0 (WRONG!).8 


The situation may be readily rectified by using the notion of the Gibbs potential energy, just as it 
was done for the electric field in Sec. 3.5 (and implicitly in the end of Sec. 1.3). According to Eq. (14), 
in magnetostatics, the Cartesian components of the field H(r) play the role of the generalized forces, 
while those of the field B(r), of the generalized coordinates (per unit volume).° As the result, the Gibbs 
potential energy, whose minimum corresponds to the stable equilibrium of the system under the effect of 
a fixed generalized force (in our current case, of the fixed external field H.,:), is 


(6.17) 


— the expression parallel to Eq. (3.78). For a system with linear magnetics, we may use for w our result 
(15), getting the following Gibbs energy’s density: 


1 
Wo(0)= 5 BBW, B= (B- sill) + const (6.18) 


where “const” means a term independent of the field B inside the sample. For our simple cylindrical 
system, with its uniform fields, Eqs. (17)-(18) gives the following full Gibbs energy of the sample: 


(B int HH ext y 
2h 


whose minimum immediately gives the correct stationary value Bin = WHext, -e. Hin = Bin/ = Hext, 
which was already obtained in Sec. 5.6 in a different way, from the boundary condition (5.117). 


U,= 


V +const , (6.19) 


Now notice that with this result on hand, Eq. (18) may be rewritten in a different form: 
2 
Rio PHS? BS, (6.20) 
2b HM 2h 
similar to Eq. (15) for u(r), but with an opposite sign. This sign dichotomy explains that of Eqs. (5.53) 
and Eq. (5.54); indeed, as was already noted in Sec. 5.3, the former of these expressions gives the 
potential energy whose minimum corresponds to the equilibrium of a system with fixed currents. (In our 
current example, these are the external stand-alone currents inducing the field Hext.) So, the energy U; 
given by Eq. (5.53) is essentially the Gibbs energy Ug defined by Eqs. (17) and (for the equilibrium 
state of linear magnetic media) by Eq. (20), while Eq. (5.54) is just another form of Eq. (15) — as was 
explicitly shown in Sec. 5.3.!° 


8 This erroneous result cannot be corrected by just adding the energy of the field outside the cylinder because in 
the limit A > 0, this field is not affected by the internal field B. 

9 Note an aspect in that the analogy with electrostatics is not quite complete. Indeed, according to Eq. (3.76), in 
electrostatics, the role of a generalized coordinate is played by the “would-be” field D, and that of the generalized 
force, by the actual (if macroscopic) electric field E. This difference may be traced back to the fact that the 
electric field E may perform work on a moving charged particle, while the magnetic field cannot. However, this 
difference does not affect the full analogy of the expressions (3.73) and (15) for the field energy density in /inear 
media. 

10 As was already noted in Sec. 5.4, one more example of the energy U; (i.e. Uc) is given by Eq. (5.100). 
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Let me complete this section by stating that the difference between the energies U and Ug is not 
properly emphasized (or even left obscure) in some textbooks, so that the reader is advised to get 
additional clarity by solving a few additional simple problems — for example, by spelling out these 
energies for a long straight solenoid (Fig. 5.6a), and then using the results to calculate the pressure 
exerted by the magnetic field on the solenoid’s walls (windings) and the longitudinal forces exerted on 
its ends. 


6.3. Quasistatic approximation and skin effect 


Perhaps the most surprising experimental fact concerning the time-dependent electromagnetic 
phenomena is that unless they are so fast that one more new effect of the displacement currents (to be 
discussed in Sec. 7 below) becomes noticeable, all formulas of electrostatics and magnetostatics remain 
valid, with the only exception: the generalization of Eq. (3.36) to Eq. (5), describing the Faraday 
induction. As a result, the system of macroscopic Maxwell equations (5.109) is generalized to 


(6.21) 


(As it follows from the discussions in chapters 3 and 5, the corresponding system of microscopic 
Maxwell equations for the genuine, “microscopic” fields E and B may be obtained from Eq. (21) by the 
formal substitutions D = «E and H = B/zw, and the replacement of the stand-alone charge and current 
densities o and j with their full densities.!!) These equations, whose range of validity will be quantified 
in Sec. 7, define the so-called quasistatic approximation of electromagnetism and are sufficient for an 
adequate description of a broad range of physical effects. 


In order to form a complete system of equations, Eqs. (21) should be augmented by constituent 
equations describing the medium under consideration. For a linear isotropic material, they may be taken 
in the simplest (and simultaneously, most common) linear and isotropic forms already discussed in 
Chapters 4 and 5: 

j=, B= 2H. (6.22) 


If the conductor is uniform, i.e. the coefficients o and yz are constant inside it, the whole system of Eqs. 
(21)-(22) may be reduced to just one simple equation. Indeed, a sequential substitution of these 
equations into each other, using a well-known vector-algebra identity!” in the middle, yields: 


1 
OB __yxp=—lyxj=-LVx(VxH) =-—-Vx(VxB)=-—|v(V-B)-V’B] 
Ot oO Oo OL OL 
(6.23) 
-—V’B. 
oul 


Thus we have arrived, without any further assumptions, at a rather simple partial differential 
equation. Let us use it for an analysis of the so-called skin effect, the phenomenon of an Ohmic 


'l Obviously, in free space, the last replacement is unnecessary, because all charges and currents may be treated as 
“stand-alone” ones. 
12 See, e.g., MA Eq. (11.3). 
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conductor’s self-shielding from the alternating (ac) magnetic field. In its simplest geometry (Fig. 2a), an 
external source (which, at this point, does not need to be specified) produces, near a plane surface of a 
bulk conductor, a spatially-uniform ac magnetic field H(#) parallel to the surface.!3 


(b) 


Fig. 6.2. (a) The skin effect in 

—¢ _ the simplest, planar geometry, 
and (b) two Ampére contours, 
C, and C,, for deriving the 
“macroscopic” (C;) and the 
“coarse-grain” (C2) boundary 
conditions for H. 


Selecting the coordinate system as shown in Fig. 2a, we may express this condition as 


H|_j=H (On, . (6.24) 


x 


The translational symmetry of our simple problem within the surface plane [y, z] implies that inside the 
conductor, 0/Oy = 0/dz = 0 as well, and H = H(x, t)n, even at x 2 0, so that Eq. (23) for the conductor’s 
interior is reduced to a differential equation for just one scalar function H(x, t) = B(x, t)/u: 


for x>0. (6.25) 


This equation may be further simplified by noticing that due to its linearity, we may use the linear 
superposition principle for the time dependence of the field,!* via expanding it, as well as the external 
field (24), into the Fourier series: 


A(x,t)=>°H, (xjel, for x.>0, 


° | (6.26) 
H(t) = Save for x = —0, 


and arguing that if we know the solution for each frequency component of the series, the whole field 
may be found through the straightforward summation (26) of these solutions. 


For each single-frequency component, Eq. (25) is immediately reduced to an ordinary 
differential equation for the complex amplitude H/,,{x):!5 


'3 Due to the simple linear relation B = H between the fields B and H, it does not matter too much which of 
them is used for the solution of this problem, with a slight preference for H, due to the simplicity of Eq. (5.117) — 
the only boundary condition relevant for this simple geometry. 

14 Another way to exploit the linearity of Eq. (6.25) is to use the spatial-temporal Green’s function approach to 
explore the dependence of its solutions on various initial conditions. Unfortunately, because of a lack of time, I 
have to leave an analysis of this opportunity for the reader’s exercise. 
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-~ioH ,=—_H.. (6.27) 


From the theory of linear ordinary differential equations, we know that Eq. (27) has the following 


general solution: 
kK Xx 


H(x)=H,e'* +H_e™™, (6.28) 
where the constants « are the roots of the characteristic equation that may be obtained by the 


substitution of any of these two exponents into the initial differential equation. For our particular case, 
the characteristic equation following from Eq. (27) is 


2 ae (6.29) 


and its roots are 


(6.30) 


For our problem, the field cannot grow exponentially at x — +co, so only one of the coefficients, 
namely the H_ corresponding to the decaying exponent, with Re x < 0, may be different from zero, i.e. 
Hx) = Hf(0)exp{«xx}. To find the constant factor H,{0), we can integrate the macroscopic Maxwell 
equation V x H = j along a pre-surface contour — say, the contour C; shown in Fig. 2b. The right-hand 
side’s integral is negligible because the stand-alone current density j does not include the “genuinely- 
surface” currents responsible for the magnetic permeability “— see Fig. 5.12. As a result, we get the 
boundary condition similar to Eq. (5.117) for the stationary magnetic field: H, = const at x = 0, giving 
us 


H(0,t)=H(), ic H,(0)=H, (6.31) 


so that the final solution of our boundary problem may be represented as 
H (x)= H® exp{x_x} = Ho ex 7 fox} c . I , (6.32) 


where the constant &, with the dimension of length, is called the skin depth: 


(6.33) 


This solution describes the skin effect: the penetration of the ac magnetic field, and the eddy 
currents j, into a conductor only to a finite depth of the order of 6,. Let me give a few numerical 
examples of this depth: for copper at room temperature, 6, ~ 1 cm at the usual ac power distribution 
frequency of 60 Hz, and is of the order of just 1 um at a few GHz, i.e. at typical frequencies of cell 


'S Let me hope that the reader is not intimidated by the (very convenient) use of such complex variables for 
describing real fields; their imaginary parts always disappear at the final summation (26). For example, if the 
external field is purely sinusoidal, with the actual (positive) frequency @, each sum in Eq. (26) has just two terms, 
with complex amplitudes H,,and H_, = H,*, so that their sum is always real. (For a more detailed discussion of 
this issue, see, e.g., CM Sec. 5.1.) 
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phone signals and kitchen microwave magnetrons. On the other hand, for lightly salted water, 6, is close 
to 250 m at just 1 Hz (with significant implications for radio communications with submarines), and of 
the order of 1 cm at a few GHz (explaining, in particular, the nonuniform heating of a soup bowl in a 
microwave oven).!6 


Let me hope that the equality chain (23) makes the physics of this effect very clear: the external 
electric field E, which is Faraday-induced by an external ac magnetic field, drives the eddy currents j, 
which in turn induce their own magnetic field that eventually (at x ~ 6,) compensates the external one. 
Let us quantify these E and j. Since we have used, in particular, relations j = V x H= V = B/y, and E = 
j/o, and spatial differentiation of an exponent yields a similar exponent, the electric field and current 
density have the same spatial dependence as the magnetic field, i.e. penetrate the conductor only by 
distances of the order of 6,(@). Their vectors are directed normally to B, while still being parallel to the 
conductor’s surface:!7 


j,(x)=«H,()n,,  E,(x)==H,(x)n,. (6.34) 
Oo 


We may use these expressions, in particular, to calculate the time-averaged power density (4.39) 
of the energy dissipation, for the important case of a sinusoidal (“monochromatic”) field A(x, f) = |H(x)| 
cos(a@t + ~), and hence sinusoidal eddy currents: j(x, t) = |j_(x)| cos(a@t + 9’): 


_.  7°(x,t)_ |i(x) cos*(at+o!) line) |x| |A ey ALY 
f(x)= ~ = = =ber, 
oO oO 20 20 0-0 


Ss 


(6.35) 


Now the (elementary) integration of this expression along the x-axis (through all the skin depth), using 
the exponential law (6.32), gives us the following average power of the energy loss per unit area: 


(6.36) 


We will extensively use this expression in the next chapter to calculate the energy losses in microwave 
waveguides and resonators with conducting (practically, metallic) walls, and for now let me note only 
that according to Eqs. (33) and (36), for a fixed magnetic field amplitude, the losses grow with 
frequency as w'”. 

One more important remark concerning Eqs. (34): integrating the first of them over x, with the 
help of Eq. (32), we may see that the /inear density J of the surface currents (measured in A/m), is 
simply and fundamentally related to the applied magnetic field: 


(6.37) 


Jy, = Ji. @ax = An, . 
0 

Since this relation does not have any frequency-dependent factors, we may sum it up for all frequency 

components, and get a universal relation 


16 Let me hope that the reader’s physical intuition makes it evident that the skin effect remains conceptually the 
same for samples of any shape, besides possibly some quantitative details of the field distribution. 

'7 The loop (vortex) character of the induced current lines, responsible for the term “eddy”, is not very apparent in 
the 1D geometry explored above, with the near-surface currents (Fig. 2b) looping only implicitly, at z + soo. 
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I(t)= A%()n, = HOE, xn, )=A()xCn,)=H(¢)xn, (6.38a) 
(where n = —n, is the outer normal to the surface — see Fig. 2b) or, in a different form, 
AH(t) = nx J(t), (6.38b) 


where AH is the full change of the field through the skin layer. This simple coarse-grain relation 
(independent of the choice of coordinate axes), is also independent of the used constituent relations (22), 
and is by no means occasional. Indeed, it may be readily obtained from the macroscopic Ampére law 
(5.116), by applying it to a contour drawn around a fragment of the surface, extending under it 
substantially deeper than the skin depth — see the contour C, in Fig. 2b. Hence, Eq. (38) is valid 
regardless of the exact law of the field penetration. 


For the skin effect, this fundamental relationship between the linear current density and the 
external magnetic field implies that the skin effect’s implementation does not necessarily require a 
dedicated ac magnetic field source. For example, the effect takes place in any wire that carries an ac 
current, leading to a current’s concentration in a surface sheet of thickness ~6d,. (Of course, the 
quantitative analysis of this problem in a wire with an arbitrary cross-section may be technically 
complicated, because it requires solving Eq. (23) for the corresponding 2D geometry; even for the round 
cross-section, the solution involves the Bessel functions.) In this case, the ac magnetic field outside the 
conductor, which still obeys Eq. (38), may be better interpreted as the effect, rather than the cause, of 
the ac current flow. 


Finally, please mind the limited validity of all the above results. First, for the quasistatic 
approximation to be valid, the field frequency @ should not be too high, so that the displacement current 
effects are negligible. (Again, this condition will be quantified in Sec. 7 below; it will show that for 
metals, the condition is violated only at extremely high frequencies above ~10'* s'.) A more practical 
upper limit on @ is that the skin depth 6, should stay much larger than the mean free path / of charge 
carriers, !8 because beyond this point, the constituent relation between the vectors j(r) and E(r) becomes 
essentially non-local. Both theory and experiment show that at 6, below /, the skin effect persists, but 
acquires a frequency dependence slightly different from Eq. (33): & « ow” rather than o'”. 
Historically, this anomalous skin effect has been very useful for the measurements of the Fermi surfaces 
of metals.!9 


6.4. Electrodynamics of superconductivity, and the gauge invariance 


The effect of superconductivity2° takes place (in certain materials only, mostly metals) when 
temperature 7 is reduced below a certain critical temperature T, specific for each material. For most 
metallic superconductors, 7, is of the order of typically a few kelvins, though several compounds (the 
so-called high-temperature superconductors) with T, above 100 K have been found since 1987. The 
most notable property of superconductors is the absence, at 7 < 7,, of measurable resistance to (not very 
high) de currents. However, the electromagnetic properties of superconductors cannot be described by 


18 A discussion of the mean free path may be found, for example, in SM Chapter 6. In very clean metals at very 
low temperatures, 6, may approach / at frequencies as low as ~1 GHz, but at room temperature, the crossover 
between the normal to the anomalous skin effect takes place only at ~ 100 GHz. 

19 See, e.g., A. Abrikosov, [Introduction to the Theory of Normal Metals, Academic Press, 1972. 

20 Discovered experimentally in 1911 by Heike Kamerlingh Onnes. 
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just taking o =o in our previous results. Indeed, for this case, Eq. (33) would give 6, = 0, i.e., no ac 
magnetic field penetration at all. Experiment shows something substantially different: weak magnetic 
fields do penetrate into superconductors by a material-specific distance 5, ~ 10°'-10° m, the so-called 
London’s penetration depth,?! which is virtually frequency-independent until the skin depth 6,, of the 
same material in its “normal” state, 1.e. the absence of superconductivity, becomes less than 6. (This 
crossover happens typically at frequencies w ~ 10'°-10"* s"'.) The smallness of 5, on the human scale 
means that the magnetic field is pushed out from macroscopic samples at their transition into the 
superconducting state. 


This Meissner-Ochsenfeld effect, discovered experimentally in 1933,?2 may be partly understood 
using the following classical reasoning. Our discussion of the Ohm law in Sec. 4.2 implied that the 
current’s (and hence the electric field’s) frequency @ is either zero or sufficiently low. In the classical 
Drude reasoning, this is acceptable while wr << 1, where 7 is the effective carrier scattering time 
participating in Eqs. (4.12)-(4.13). If this condition is not satisfied, we should take into account the 
charge carrier inertia; moreover, in the opposite limit @z >> 1, we may neglect the scattering at all. 
Classically, we can describe the charge carriers in such a “perfect conductor” as particles with a non- 
zero mass m, which are accelerated by the electric field following the 2"’ Newton law (4.11), 


mv =F=gE, (6.39) 
so that the current density j = gnv that they create, changes in time as 
2 
j=qnw =k. (6.40) 
m 


In terms of the Fourier amplitudes of the functions j(t) and E(d), this means 


~ioj, =... (6.41) 
m 
Comparing this formula with the relation j,, = oF, implied in the last section, we see that we can use all 


its results with the following replacement: 
2 


o >it" (6.42) 
Mm@Q@ 


This change replaces the characteristic equation (29) with 


2 2 
~io=— lite x? =", (6.43) 
iq nu m 
i.e. replaces the skin effect with the field penetration by the following frequency-independent depth: 
1/2 
bet [ “ ) (6.44) 
Kk \uqn 


Superficially, this means that the field decay into the superconductor does not depend on frequency: 


21 Named so to acknowledge the pioneering theoretical work of brothers Fritz and Heinz London — see below. 
22 It is hardly fair to shorten this name to just the “Meissner effect” as it is frequently done, because of the 
reportedly crucial contribution by Robert Ochsenfeld, then a Walther Meissner’s student, to the discovery. 
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H(x,t)=H(0,He*'?, (6.45) 
thus explaining the Meissner-Ochsenfeld effect. 


However, there are two problems with this result. First, for the parameters typical for good 
metals (q =—e, n ~ 10°’ m°, m ~ me, 1 LW), Eq. (44) gives 5 ~ 10° m, one or two orders of magnitude 
lower than the experimental values of 6. Experiment also shows that the penetration depth diverges at T 
— T,, which is not predicted by Eq. (44). 


The second, much more fundamental problem with Eq. (44) is that it has been derived for wr 
>> 1. Even if we assume that somehow there is no scattering at all, i.e. 7 = 00, at @ > 0 both parts of the 
characteristic equation (43) vanish, and we cannot make any conclusion about « This is not just a 
mathematical artifact we could ignore. For example, let us place a non-magnetic metal into a static 
external magnetic field at T > T.. The field would completely penetrate the sample. Now let us cool it. 
As soon as temperature is decreased below 7; the above calculations would become valid, forbidding 
the penetration into the superconductor of any change of the field, so that the initial field would be 
“frozen” inside the sample. The Meissner-Ochsenfeld experiments have shown something completely 
different: as T is lowered below T,, the initial field is being expelled out of the sample. 


The resolution of these contradictions is provided by quantum mechanics. As was explained in 
1957 in a seminal work by J. Bardeen, L. Cooper, and J. Schrieffer (commonly referred to as the BCS 
theory), superconductivity is due to the correlated motion of electron pairs, with opposite spins and 
nearly opposite momenta. Such Cooper pairs, each with the electric charge g = —2e and zero spin, may 
form only in a narrow energy layer near the Fermi surface, of a certain thickness A(7). This parameter 
A(T), which may be also interpreted as the binding energy of the pair, tends to zero at T — T,, while at T 
<< T, it has a virtually constant value A(O) ~ 3.5 kpZ,, of the order of a few meV for most 
superconductors. This fact readily explains the relatively low spatial density of the Cooper pairs: ny ~ 
n\(T)/é ~ 10°° m°. With the correction n > Np, Eq. (44) for the penetration depth becomes 


(6.46) 


This result diverges at T — T,, and generally fits the experimental data reasonably well, at least for the 
so-called “clean” superconductors with the mean free path / = ver (where vp ~ (2mex)" ? is the rms. 
velocity of electrons on the Fermi surface) much longer than the Cooper pair size €—see below. 


The smallness of the coupling energy A(7) is also a key factor in the explanation of the 
Meissner-Ochsenfeld effect. Because of Heisenberg’s quantum uncertainty relation drdp ~ h, the spatial 
extension of the Cooper-pair’s wavefunction (the so-called coherence length of the superconductor) is 
relatively large: € ~ or ~ h/dp ~ hive/A(T) ~ 10° m. As a result, ip >> 1, meaning that the 
wavefunctions of the pairs are strongly overlapped in space. Due to their integer spin, Cooper pairs 
behave like bosons, which means in particular that at low temperatures they exhibit the so-called Bose- 
Einstein condensation onto the same ground energy level é.?3 This means that the quantum frequency @ 


23 A quantitative discussion of the Bose-Einstein condensation of bosons may be found in SM Sec. 3.4, though 
the full theory of superconductivity is more complicated because it has to describe the condensation taking place 
simultaneously with the formation of effective bosons (Cooper pairs) from fermions (single electrons). For a 
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= &/h of the time evolution of each pair’s wavefunction ‘Y = wexp{-iat} is exactly the same, and that 
the phases gy of the wavefunctions, defined by the relation 


yw =|yle’?, (6.47) 


coincide, so that the electric current is carried not by individual Cooper pairs but rather their Bose- 
Einstein condensate described by a single wavefunction (47). Due to this coherence, the quantum effects 
(which are, in the usual Fermi-gases of single electrons, masked by the statistical spread of their 
energies, and hence of their phases), become very explicit — “macroscopic”. 


To illustrate this, let us write the well-known quantum-mechanical formula for the probability 
current density of a free, non-relativistic particle,”4 
i, = 22 (wy"* -cc.)=—[v*Cav)y —ccl], (6.48) 
2m 2m 
where c.c. means the complex conjugate of the previous expression. Now let me borrow one result that 
will be proved later in this course (in Sec. 9.7) when we discuss the analytical mechanics of a charged 
particle moving in an electromagnetic field. Namely, to account for the magnetic field effects, the 
particle’s kinetic momentum p = mv (where v = dr/dt is the particle’s velocity) has to be distinguished 
from its canonical momentum,?5 
P=p+@A. (6.49) 


where A is the field’s vector potential defined by Eq. (5.27). In contrast with the Cartesian components 
pj = my; of the kinetic momentum p, the canonical momentum’s components are the generalized 
momenta corresponding to the Cartesian components 7; of the radius-vector r, considered as generalized 
coordinates of the particle: P; = 0¥Y/0v;, where ¥ is the particle’s Lagrangian function. According to the 
general rules of transfer from classical to quantum mechanics,”° it is the vector P whose operator (in the 
coordinate representation) equals —iiV, so that the operator of the kinetic momentum p = P — gA is —iiV 
+ qA. Hence, to account for the magnetic field?’ effects, we should make the following replacement, 


inV > -ihV —qA, (6.50) 


in all quantum-mechanical relations. In particular, Eq. (48) has to be generalized as 


i, = : [w*( inV gA)y -c.c.]. (6.51) 
2m 


This expression becomes more transparent if we take the wavefunction in form (47); then 


detailed, but still very readable coverage of the physics of superconductors, I can recommend the reader the 
monograph by M. Tinkham, Introduction to Superconductivity, 2" ed., McGraw-Hill, 1996. 

24 See, e.g., QM Sec. 1.4, in particular Eq. (1.47). 

25 | am sorry to use traditional notations p and P for the momenta — the same symbols which were used for the 
electric dipole moment and polarization in Chapter 3. I hope there will be no confusion because the latter notions 
are not used in this section. 

26 See, e.g., CM Sec. 10.1, in particular Eq. (10.26). 

27 The account of the electric field is easier, because the related energy q¢ of the particle may be directly included 
in the potential energy operator. 
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ot 
i, -AWv[ve-4a). (6.52) 
m h 


This relation means, in particular, that in order to keep j,, gauge-invariant, the transformation (8)-(9) has 
to be accompanied by a simultaneous transformation of the wavefunction’s phase: 


O04 2. (6.53) 


It is fascinating that the quantum-mechanical wavefunction (or more exactly, its phase) is not gauge- 
invariant, meaning that you may change it in your mind — at your free will! Again, this does not change 
any observable (such as j,, or the probability density wy*), i.e. any experimental results. 


Now for the e/ectric current density of the whole superconducting condensate, Eq. (52) yields 
the following constitutive relation: 


(6.54) 


The formula shows that this supercurrent may be induced by the dc magnetic field alone and does not 
require any electric field. Indeed, for the simple 1D geometry shown in Fig. 2a, j(r) = j(x)n., A(r) = A(x) 
n,, and 0/oz = 0, so that the Coulomb gauge condition (5.48) is satisfied for any choice of the gauge 
function 7(x). For the sake of simplicity we can choose this function to provide g(r) = const,?8 so that 


1 


— A= A. 6.55 
= 715 (6.55) 


where 6, is given by Eq. (46), and the field is assumed to be small and hence not affecting the 
probability |y|’ (here normalized to 1 in the absence of the field). This is the so-called London equation, 
proposed (in a different form) by F. and H. London in 1935 for the Meissner-Ochsenfeld effect’s 
explanation. Combining it with Eq. (5.44), generalized for a linear magnetic medium by the replacement 
io > H, we get 


—A, (6.56) 


This simple differential equation, similar to Eq. (23), for our 1D geometry has an exponential solution 
similar to Eq. (32): 


A(x) = A(0) ex a B(x) = B(0) ox 


L 


x x 
—-—+?, i(x) = j(0) exp, -—;, 6.57 
=| I(x) = J) 7 =| (6.57) 
which shows that the magnetic field and the supercurrent penetrate into a superconductor only by 
London’s penetration depth 6,, regardless of frequency.”° By the way, integrating the last result through 
the penetration layer, and using the vector potential’s definition, B = Vx A (for our geometry, giving 


28 This is the so-called London gauge; for our simple geometry, it is also the Coulomb gauge (5.48). 

29 Since at 7 > 0, not all electrons in a superconductor form Cooper pairs, at any frequency w # 0 the unpaired 
electrons provide energy-dissipating Ohmic currents, which are not described by Eq. (54). These losses become 
very substantial when the frequency w becomes so high that the skin-effect length 6, of the material becomes less 
than 6,. For typical metallic superconductors, this crossover takes place at frequencies of a few hundred GHz, so 
that even for microwaves, Eq. (57) still gives a fairly accurate description of the field penetration. 
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B(x) = dA(x)/dx = —d_A(x)) we may readily verify that the linear density J of the surface supercurrent 
still satisfies the universal coarse-grain relation (38). 


This universality should bring to our attention the following common feature of the skin effect 
(in “normal” conductors) and the Meissner-Ochsenfeld effect (in superconductors): if the linear size of a 
bulk sample is much larger than, respectively, 6; or o., than B = 0 in the dominating part of its interior. 
According to Eq. (5.110), a formal description of such conductors (valid only on a coarse-grain scale 
much larger than either 6 or 6,), may be achieved by formally treating the sample as an ideal 
diamagnet, with yz = 0. In particular, we can use this description and Eq. (5.124) to immediately obtain 
the magnetic field’s distribution outside of a bulk sphere: 

3 


B=y,H=-4,V¢,,; with ¢, = a - r -5) cos 8, for r2=R. (6.58) 
r 


Figure 3 shows the corresponding surfaces of equal potential ¢,,. It is evident that the magnetic 
field lines (which are normal to the equipotential surfaces) bend to become parallel to the surface near it. 


ee 


Fig. 6.3. Equipotential surfaces 
H, ¢m = const around a conducting 
sphere of radius R >> 6, (or 6;), 
placed into a uniform magnetic 
field, calculated within the 
coarse-grain (ideal-diamagnet) 
approximation y= 0. 


This pattern also helps to answer the question that might arise at making the assumption (24): 
what happens to bulk conductors placed into a normal ac magnetic field — and to superconductors in a 
normal dc magnetic field as well? The answer is: the field is deformed outside of the conductor to 
sustain the following coarse-grain boundary condition:>° 


0, (6.59) 


n| surface 


which follows from Eq. (5.118) and the coarse-grain requirement Blinside = 0. 


This answer should be taken with reservations. For normal conductors it is only valid at 
sufficiently high frequencies where the skin depth (33) is sufficiently small: 6; << a, where a is the scale 
of the conductor’s linear size — for a sphere, a ~ R. In superconductors, this simple picture requires not 
only that & << a, but also that magnetic field is relatively low because strong fields do penetrate 


30 Sometimes this boundary condition, as well as the (compatible) Eq. (38), are called “macroscopic”. However, 
this term may lead to confusion with the genuine macroscopic boundary conditions (5.117)-(5.118), which also 
ignore the atomic-scale microstructure of the “effective currents” j.-¢ = WxM, but (as was shown earlier in this 
section) still allow explicit, detailed accounts of the skin-current (34) and supercurrent (55) distributions. 
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superconductors, destroying superconductivity (either completely or partly), and as a result violating the 
Meissner-Ochsenfeld effect — see the next section. 


6.5. Electrodynamics of macroscopic quantum phenomena?! 


Despite the superficial similarity of the skin effect and the Meissner-Ochsenfeld effect, the 
electrodynamics of superconductors is much richer. For example, let us use Eq. (54) to describe the 
fascinating effect of magnetic flux quantization. Consider a closed ring/loop (not necessarily a round 
one) made of a superconducting “wire” with a cross-section much larger than 6,” (Fig. 4a). 


| (a) . () 
ve" |yle"” vie le lye’ ———. |v’ 


Fig. 6.4. (a) A closed, flux-quantizing superconducting ring, (b) a ring with a narrow slit, 
and (c) a Superconducting QUantum Interference Device (SQUID). 


From the last section’s discussion, we know that deep inside the wire the supercurrent is 
exponentially small. Integrating Eq. (54) along any closed contour C that does not approach the surface 
closer than a few 6; at any point (see the dashed line in Fig. 4), so that with j = 0 at all its points, we get 


$V o-de-“fA-dr=0. (6.60) 
G G 


The first integral, i.e. the difference of g in the initial and final points, has to be equal to either zero or 
an integer number of 27 because the change @— g+2zm does not change the Cooper pair’s 
condensate’s wavefunction: 


1 aly el?) <ly el? ay. (6.61) 


On the other hand, according to Eq. (5.65), the second integral in Eq. (60) is just the magnetic flux ® 
through the contour.*? As a result, we get a wonderful result: 


31 The material of this section is not covered in most E&M textbooks, and will not be used in later sections of this 
course. Thus the “only” loss due to the reader’s skipping this section would be the lack of familiarity with one of 
the most fascinating fields of physics. Note also that we already have virtually all formal tools necessary for its 
discussion, so reading this section should not require much effort. 

32 Due to the Meissner-Ochsenfeld effect, the exact path of the contour is not important, and we may discuss © 
just as the magnetic flux through the ring. 
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2ah 


@=n®,, where d,=——, with n=0,+1,+£2...., (6.62) 


|q 


saying that the magnetic flux inside any superconducting loop can only take values multiple of the flux 
quantum Do. This effect, predicted in 1950 by the same Fritz London (who expected q to be equal to the 
electron charge —e), was observed experimentally in 1961,33 but with | g | = 2e — so that Dp ~ 2.07x10°° 
Wb. Historically, this observation gave decisive support to the BCS theory of superconductivity 
(implying Cooper pairs with charge g = —2e) that had been put forward just four years earlier. 


Note the truly macroscopic character of this quantum effect: it has been repeatedly observed in 
human-scale superconducting loops, and from what is known about superconductors, there is no doubt 
that if we had made a giant superconducting wire loop extending, say, over the Earth’s equator, the 
magnetic flux through it would still be quantized — though with a very large flux quanta number n. This 
means that the quantum coherence of Bose-Einstein condensates may extend over, using H. Casimir’s 
famous expression, “miles of dirty lead wire”. (Lead is a typical superconductor, with T, ~ 7.2 K, and 
indeed retains its superconductivity even being highly contaminated by impurities.) 


Moreover, hollow rings are not entirely necessary for flux quantization. In 1957, A. Abrikosov 
explained the counter-intuitive high-field behavior of superconductors with 6, > V2, known 
experimentally as their mixed (or “Shubnikov’”’) phase since the 1930s. He showed that a sufficiently 
high magnetic field may penetrate such superconductors in the form of self-formed magnetic field 
“threads” (or “tubes”) surrounded by vortex-shaped supercurrents — the so-called Abrikosov vortices. In 
the simplest case, the core of such a vortex is a straight line, on which the superconductivity is 
completely suppressed (|y| = 0), surrounded by circular, axially-symmetric, persistent supercurrents 
j(e), where p is the distance from the vortex axis — see Fig. 5a. At the axis, the current vanishes, and 
with the growth of p, it first rises and then falls (with j(«) = 0), reaching its maximum at p ~ é, while 
the magnetic field B(p), directed along the vortex axis, is largest at op = 0, and drops monotonically at 
distances of the order of dy (Fig. 5b). 


(a) (b) 


—L- 
slit lee | Fig. 6.5. The Abrikosov vortex: 


(a) a 3D structure’s sketch, and 
(b) the main variables as 


ram) lad Nie functions of the distance e 


I tot 0g i ° from the axis (schematically). 


The total flux of the field equals exactly one flux quantum Wo, given by Eq. (62). 
Correspondingly, the wavefunction’s phase g performs just one +27 revolution along any contour 
drawn around the vortex’s axis, so that Vg = tn,/p, where ny is the azimuthal unit vector.*4 This 
topological feature of the wavefunction’s phase is sometimes called the fluxoid quantization — to 


33 Independently and virtually simultaneously by two groups: B. Deaver and W. Fairbank, and R. Doll and M. 
Nabauer; their reports were published back-to-back in the same issue of the Physical Review Letters. 
34 The last (perhaps, evident) expression for Vg follows from MA Eq. (10.2) with f= +g+ const. 
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distinguish it from the magnetic flux quantization, which is valid only for relatively large contours, not 
approaching the axis by distances ~6r. 


A quantitative analysis of Abrikosov vortices requires, besides the equations we have discussed, 
one more constituent relation that would describe the suppression of the number of Cooper pairs 
(quantified by |y|°) by the magnetic field — or rather by the field-induced supercurrent. In his original 
work, Abrikosov used for this purpose the famous Ginzburg-Landau equation,*> which is quantitatively 
valid only at 7’ ~ T.. The equation may be conveniently represented in either of the following two forms: 


2 
> 


2 

5—(inv—ga)'y =ay-byy|, &y (vita) v=(-WvP }y (6.63) 

where a and b are certain temperature-dependent coefficients, with a + 0 at T — T,. The first of these 
forms clearly shows that the Ginzburg-Landau equation (as well as the similar Gross-Pitaevskii equation 
describing electrically-neutral Bose-Einstein condensates) belongs to a broader class of nonlinear 
Schrédinger equations, differing only by the additional nonlinear term from the usual Schrédinger 
equation, which is linear in yw. The equivalent, second form of Eq. (63) is more convenient for 
applications and shows more clearly that if the superconductor’s condensate density, proportional to 
|y|°, is suppressed only locally, it self-restores to its unperturbed value (with |y|’ = 1) at the distances 


of the order of the coherence length €= h/(2ma)'”. 


This fact enables a simple quantitative analysis of the Abrikosov vortex in the most important 
limit é << &. Indeed, as Fig. 5 shows, in this case. |y| 7 = 1 at most distances (9 ~ 6.) where the field 
and current are distributed, so that these distributions may be readily calculated without any further 
involvement of Eq. (63), just from Eq. (54) with Vg = +n,/p, and the Maxwell equations (21) for the 
magnetic field, giving Vx B = wj, and V-B = 0. Indeed, combining these equations just as this was 
done at the derivation of Eq. (23), for the only Cartesian component of the vector B(r) = B(p)n. (where 
the z-axis is directed along the vortex’ symmetry axis), we get a simple equation 


82V7B-B=—"Vx(vVx)=F0,6,(0) at poe e, (6.64) 
q 
which coincides with Eq. (56) at all regular points p 4 0. Spelling out the Laplace operator for our 
current case of axial symmetry,*° we get an ordinary differential equation, 


pid pB)-s-o for p #0. (6.65) 
dp 


Comparing this equation with Eq. (2.155) with v= 0, and taking into account that we need the solution 
decreasing at 9 — o, making any contribution proportional to the function Jp unacceptable, we get 


35 This equation was derived by Vitaly Lazarevich Ginzburg and Lev Davidovich Landau from phenomenological 
arguments in 1950, 1.e. before the advent of the “microscopic” BSC theory, and may be used for simple analyses 
of a broad range of nonlinear effects in superconductors. The Ginzburg-Landau and Gross-Pitaevskii equations 
will be further discussed in SM Sec. 4.3. 

36 See, e.g., MA Eq. (10.3) with 0/Og = 0/dz = 0. 
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L 


B=CK, (2) (6.66) 


— see the plot of this Bessel function on the right panel of Fig. 2.22 (black line). The constant C should 
be calculated from fitting the 2D delta function on the right-hand side of Eq. (64), i.e. by requiring 


[ B(e)a’ p = 2] B(o)pdp = 276;C[ K,(¢ bag =F0,. (6.67) 


vortex 0 


The last, dimensionless integral equals 1,37 so that finally 


® 
B(p)= = «2 at p>>é. (6.68) 

So the magnetic field of the vortex drops exponentially at distances o much larger than 6,, and 
diverges at p— 0 — see, e.g., the second of Eqs. (2.157). However, this divergence is very slow 
(logarithmic), and, as was repeatedly discussed in this series, is avoided by the account of virtually any 
other factor. In our current case, this factor is the decrease of | y|? to zero at p ~ & (see Fig. 5), not taken 
into account in Eq. (68). As a result, we may estimate the field on the axis of the vortex as 


® 
B(O)= —& in2h.: (6.69) 
200 «G 


the exact (and much more involved) solution of the problem confirms this estimate with a minor 


correction: In(d,/é) > In(6,/é) — 0.28, 1.e. E> 1.3¢. 


The current density distribution may be now calculated from the Maxwell equation V x B = sj, 
giving j =/(P)Ng, with’ 


1 OB ®, oO cy 
i(p)= Be at p>>&, (6.70) 
uop 2mpl6,, OP 0, ) 2m, O, 


where the same identity (2.158), with J, > K, and n = 1, was used. Now looking at Eqs. (2.157) and 
(2.158), with n = 1, we see that the supercurrent’s density is exponentially low at p >> 6; (thus outlining 
the vortex’ periphery), and is proportional to 1/p within the broad range & << p << 6,. This rise of the 
current at 9 — 0 (which could be readily predicted directly from Eq. (54) with Vg = tn,/p, and the A- 
term negligible at p << 6.) is quenched at p ~ & by a rapid drop of the factor |y|’ in the same Eq. (54), 
i.e. by the suppression of the superconductivity near the axis (by the same supercurrent!) — see Fig. 5 
again. 


This structure of the Abrikosov vortex may be used to calculate, in a straightforward way, its 
energy per unit length (i.e. its linear tension) 


37 This fact follows, for example, from the integration of both sides of Eq. (2.143) (which is valid for any Bessel 
functions, including K,,) with n = 1, from 0 to «, and then using the asymptotic values given by Eqs. (2.157)- 
(2.158): Ki() = 0, and Ki(¢) > 1/Cat CG 0. 

38 See, e.g., MA Eq. (10.5), with f, =f, = 0, and f; = B(p). 
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and hence the so-called “first critical” value H.; of the external magnetic field,*? at that the vortex 
formation becomes possible (in a long cylindrical sample parallel to the field): 
TF D, oO, 


= ® In—. 6.72 
"0, ames” _ 


Let me leave the proof of these two formulas for the reader’s exercise. 


The flux quantization and the Abrikosov vortices discussed above are just two of several 
macroscopic quantum effects in superconductivity. Let me discuss just one more, but perhaps the most 
interesting of such effects. Let us consider a superconducting ring/loop interrupted with a very narrow 
slit (Fig. 4b). Integrating Eq. (54) along any current-free path from point 1 to point 2 (see, e.g., dashed 
line in Fig. 4b), we get 


2 
0=|(vo—£a)-ar=0,-, £0. (6.73) 


Using the flux quantum definition (62), this result may be rewritten as 


(6.74) 


where @ is called the Josephson phase difference. Note that in contrast to each of the phases gj», their 
difference g is gauge-invariant: Eq. (74) directly relates it to the gauge-invariant magnetic flux ®. 


Can this g be measured? Yes, for example, using the Josephson effect.*° Let us consider two (for 
the argument simplicity, similar) superconductors, connected with some sort of weak link, for example, 
a small tunnel junction, or a point contact, or a narrow thin-film bridge, through that a weak Cooper-pair 
supercurrent can flow. (Such a system of two weakly coupled superconductors is called a Josephson 
junction.) Let us think about what this supercurrent / may be a function of. For that, reverse thinking is 
helpful: let us imagine that we change the current; what parameter of the superconducting condensate 
can it affect? If the current is very weak, it cannot perturb the superconducting condensate’s density, 
proportional to |y|’; hence it may only change the Cooper condensate phases ~>. However, according 
to Eq. (53), the phases are not gauge-invariant, while the current should be. Hence the current may 
affect (or, if you like, may be affected by) only the phase difference g defined by Eq. (74). Moreover, 
just has already been argued during the flux quantization discussion, a change of any of 2 (and hence 
of ~) by 27 or any of its multiples should not change the current. Also, if the wavefunction is the same 
in both superconductors (g = 0), the supercurrent should vanish due to the system’s symmetry. Hence 
the function /(@g) should satisfy the following conditions: 


39 This term is used to distinguish H,, from the higher “second critical field” H.., at which the Abrikosov vortices 
are pressed to each other so tightly (to distances d ~ €) that they merge, and the remains of superconductivity 
vanish: y > 0. Unfortunately, I do not have time/space to discuss these effects; the interested reader may be 
referred, for example, to Chapter 5 of M. Tinkham’s monograph cited above. 

40 It was predicted in 1961 by Brian David Josephson (then a PhD student!) and observed experimentally by 
several groups soon after that. 
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1(0) =0, Ko+2m =). (6.75) 


With these conditions on hand, we should not be terribly surprised by the following Josephson’s result 
that for the weak link provided by tunneling,*! 


I(g) =I, sing, (6.76) 


where constant J, which depends on the weak link’s strength and temperature, is called the critical 
current. Actually, Eqs. (54) and (63) enable not only a straightforward calculation of this relation but 
even obtaining a simple expression of the critical current /, via the link’s normal-sate resistance — the 
task left for the (creative :-) reader’s exercise. 


Now let us see what happens if a Josephson junction is placed into the gap in a superconductor 
loop — see Fig. 4c. In this case, we may combine Eqs. (74) and (76), getting 


I=, sin ye : 
D, 


This effect of a periodic dependence of the current on the magnetic flux is called macroscopic quantum 
interference,” while the system shown in Fig. 4c, the superconducting quantum interference device — 
SQUID (with all letters capital, please :-). The low value of the magnetic flux quantum ®o, and hence 
the high sensitivity of @ to external magnetic fields, allows using such SQUIDs as ultrasensitive 
magnetometers. Indeed, for a superconducting ring of area ~1 cm’, one period of the change of the 
supercurrent (77) is produced by a magnetic field change of the order of 107' T (107 Gs), while 
sensitive electronics allows measuring a tiny fraction of this period — limited by thermal noise at a level 
of the order of a few fT. Such sensitivity allows measurements, for example, of the miniscule magnetic 
fields induced outside of the body by the beating human heart, and even by brain activity. 


(6.77) 


An important aspect of quantum interference is the so-called Aharonov-Bohm (AB) effect — 
which actually takes place for single quantum particles as well.*4 Let the magnetic field lines be limited 
to the central, hollow part of the SQUID loop so that no appreciable magnetic field ever touches the ring 
itself. (This may be done experimentally with very good accuracy, for example using high-s magnetic 
cores — see their discussion in Sec. 5.6.) As predicted by Eq. (77), and confirmed by several careful 
experiments carried out in the mid-1960s,* this restriction does not matter — the interference is observed 


41 For some other types of weak links, the function /(g) may deviate from the sinusoidal form Eq. (76) rather 
considerably, while still satisfying the general conditions (75). 

42 The name is due to a deep analogy between this phenomenon and the interference between two coherent waves, 
to be discussed in detail in Sec. 8.4. 

43 Other practical uses of SQUIDs include MRI signal detectors, high-sensitive measurements of magnetic 
properties of materials, and weak field detection in a broad variety of physical experiments — see, e.g., J. Clarke 
and A. Braginski (eds.), The SQUID Handbook, vol. U, Wiley, 2006. For a comparison of these devices with 
other sensitive magnetometers see, e.g., the review collection by A. Grosz et al. (eds.), High Sensitivity 
Magnetometers, Springer, 2017. 

44 For a more detailed discussion of the AB effect see, e.g., QM Sec. 3.2. 

45 Similar experiments have been carried out with single (unpaired) electrons — moving either ballistically, in 
vacuum, or in “normal” (non-superconducting) conducting rings. In the last case, the effect is much harder to 
observe than in SQUIDs: the ring size has to be very small, and temperature very low, to avoid the so-called 
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anyway. This means that not only the magnetic field B but also the vector potential A represents 
physical reality, albeit in a quite peculiar way — remember the gauge transformation (5.46), which you 
may carry out in your head, without changing any physical reality? (Fortunately, this transformation 
does not change the contour integral participating in Eq. (5.65), and hence the magnetic flux ®, and 
hence the interference pattern.) 


Actually, the magnetic flux quantization (62) and the macroscopic quantum interference (77) are 
not completely different effects, but just two manifestations of the interrelated macroscopic quantum 
phenomena. To show that, one should note that if the critical current /, (or rather its product by the 
loop’s self-inductance L) is high enough, the flux © in the SQUID loop is due not only to the external 
magnetic field flux ®,,; but also has a self-field component — cf. Eq. (5.68):4° 


@=®,,-LI, where ©, =| (By), 477. (6.78) 
S 


Now the relation between ® and ®,x; may be readily found by solving this equation together with Eq. 
(77). Figure 6 shows this relation for several values of the dimensionless parameter 2= 2 2LI,/Do. 


Fig. 6.6. The function ®(®,,;) for SQUIDs 
with various values of the normalized L/, 
product. Dashed arrows show the flux 
leaps as the external field is changed. (The 
branches with d®/d®,,,< 0 are unstable.) 


1 
®,,,/®, 


ex 


These plots show that if the critical current (and/or the inductance) is low, 4 << 1, the self-field 
effects are negligible, and the total flux follows the external field (i.e., ®,,,) faithfully. However, at 2 > 
1, the function D(®_x:) becomes hysteretic, and at 1 >> 1, its stable (positive-slope) branches are nearly 
flat, with the total flux values corresponding to Eq. (62). Thus, a superconducting ring closed with a 
high-/, Josephson junction exhibits a nearly-perfect flux quantization. 


The self-field effects described by Eq. (78) create certain technical problems for SQUID 
magnetometry, but they are the basis for one more useful application of these devices: ultrafast 


dephasing effects due to unavoidable interactions of the electrons with their environment — see, e.g., QM Chapter 
46 The sign before LI would be positive, as in Eq. (5.70), if J was the current flowing into the inductance. 


However, in order to keep the sign in Eq. (76) intact, 7 should mean the current flowing into the Josephson 
junction, i.e. from the inductance, thus changing the sign of the L/ term in Eq. (78). 
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computing. Indeed, Fig. 6 shows that at the values of 1 modestly above | (e.g., 2 ~ 3), and within a 
certain range of applied field, the SQUID has two stable flux states, which differ by AD ~ Dp and may 
be used for coding binary 0 and 1. For practical superconductors (like Nb), the time of switching 
between these states (see dashed arrows in Fig. 4) is of the order of a picosecond, while the energy 
dissipated at such event may be as low as ~10"'” J. (This bound is determined not by device’s physics, 
by the fundamental requirement for the energy barrier between the two states to be much higher than the 
thermal fluctuation energy scale kg7, ensuring a sufficiently long information retention time.) While the 
picosecond switching speed may be also achieved with some semiconductor devices, the power 
consumption of the SQUID-based digital devices may be 5 to 6 orders of magnitude lower, enabling 
large-scale digital integrated circuits with 100-GHz-scale clock frequencies. Unfortunately, the range of 
practical applications of these Rapid Single-Flux-Quantum (RSFQ) digital circuits is still very narrow, 
due to the inconvenience of their deep refrigeration to temperatures below 7,.*’ 


Since we have already got the basic relations (74) and (76) describing the macroscopic quantum 
phenomena in superconductivity, let me mention in brief two other prominent members of this group, 
called the dc and ac Josephson effects. Differentiating Eq. (74) over time, and using the Faraday 
induction law (2), we get*® 

dp 2e Vv 


6.79 
dt h ee) 


This famous Josephson phase-to-voltage relation should be valid regardless of the way how the voltage 
V has been created,*? so let us apply Eqs. (76) and (79) to the simplest circuit with a non- 
superconducting source of dc voltage — see Fig. 7. 


P P, 
I(t) 
Fig. 6.7. DC-voltage-biased 
y Josephson junction. 


If the current’s magnitude is below the critical value, Eq. (76) allows phase @ to have the time- 
independent value 
p=sin' =, if -J,<I<4,, (6.80) 
and hence, according to Eq. (79), a vanishing voltage drop across the junction: V = 0. This dc Josephson 
effect is not quite surprising — indeed, we have postulated from the very beginning that the Josephson 
junction may pass a certain supercurrent. Much more fascinating is the so-called ac Josephson effect that 
occurs if the voltage across the junction has a non-zero average (dc) component Vo. For simplicity, let us 


47 For more on that technology, see, e.g., the review paper by P. Bunyk et al., Int. J. High Speed Electron. Syst. 
11, 257 (2001), and references therein. 

48 Since the induced e.m.f. Ying cannot drop on the superconducting path between the Josephson junction 
electrodes 1 and 2 (see Fig. 4c), it should be equal to (-V), where V is the voltage across the junction. 

49 Indeed, it may be also obtained from simple Schrédinger-equation-based arguments — see, e.g., QM Sec. 1.6. 
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assume that this is the only voltage component: V(t) = Vo = const;5° then Eq. (79) may be easily 
integrated to give = at + @, where 


0, = Vy. (6.81) 


This result, plugged into Eq. (76), shows that the supercurrent oscillates, 
I =I, sin(a,t+,), (6.82) 


with the so-called Josephson frequency a, (81) proportional to the applied dc voltage. For practicable 
voltages (above the typical noise level), the frequency fj = @)/2z corresponds to the GHz or even THz 
ranges, because the proportionality coefficient in Eq. (81) is very high: ff/Vo = e/ah ~ 483 MHz/pV.°! 


An important experimental fact is the universality of this coefficient. For example, in the mid- 
1980s, a Stony Brook group led by J. Lukens proved that this factor is material-independent with a 
relative accuracy of at least 10°. Very few experiments, especially in solid-state physics, have ever 
reached such precision. This fundamental nature of the Josephson voltage-to-frequency relation (81) 
allows an important application of the ac Josephson effect in metrology. Namely, phase-locking*? the 
Josephson oscillations with an external microwave signal from an atomic frequency standard, one can 
get a more precise dc voltage than from any other source. In NIST and other metrological institutions 
around the globe, this effect is used for the calibration of simpler “secondary” voltage standards that can 
operate at room temperature. 


6.6. Inductors, transformers, and ac Kirchhoff laws 


Let a wire coil (meaning either a single loop illustrated in Fig. 5.4b or a series of such loops, 
such as one of the solenoids shown in Fig. 5.6) have a self-inductance Z much larger than that of the 
wires connecting it to other components of our system: ac voltage sources, voltmeters, etc. (Since, 
according to Eq. (5.75), Z scales as the square of the number WN of wire turns, this condition is easier to 
satisfy at N >> 1.) Then in a quasistatic system consisting of such Jumped induction coils, external 
wires, and other lumped circuit elements such as resistors, capacitances, etc., we may neglect the 
electromagnetic induction effects everywhere outside the coil, so that the electric field in those external 
regions is potential. Then the voltage V between the coil’s terminals may be defined, just as in 
electrostatics, as the difference of values of ¢ between the terminals, i.e. as the integral 


y =|E-dr (6.83) 


between the coil terminals along any path outside the coil. This voltage has to be balanced by the 
induction e.m.f. (2) in the coil, so that if the Ohmic resistance of the coil is negligible, we may write 


50 In experiment, this condition is hard to implement, due to relatively high inductances of the current leads 
providing the dc voltage supply. However, this technical complication does not affect the main conclusion of the 
simple analysis described here. 

5! This 1962 prediction (by the same B. Josephson) was confirmed experimentally — in 1963 indirectly, by phase- 
locking of the oscillations (82) with an external microwave signal, and in 1967 explicitly, by the direct detection 
of the emitted microwave radiation. 

52 For a discussion of this very important (and general) effect, see, e.g., CM Sec. 5.4. 
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y de 


=, 6.84 
a (6.84) 


where ©® is the magnetic flux in the coil.>? If the flux is due to the current J in the same coil only (i.e. if 
it is magnetically uncoupled from other coils), we may use Eq. (5.70) to get the well-known relation 


a (6.85) 
dt 
where compliance with the Lenz sign rule is achieved by selecting the relations between the assumed 
voltage polarity and the current direction as shown in Fig. 8a. 


a ay (b) (c) 
as D(t) Fig. 6.8. Some lumped ac circuit 


‘ ( 
rusia 
elements: (a) an induction coil, 
r| L L, N, N, (b) two inductively coupled 
og 


) 
L 
l | l l coils, and (c) an ac transformer. 


If similar conditions are satisfied for two magnetically coupled coils (Fig. 8b), then, in Eq. (84), 
we need to use Eqs. (5.69) instead, getting 
dl dI dI dl 
V,=L,—+M—, V, =L,—+M—. (6.86) 
dt dt dt dt 
Such systems of inductively coupled coils have numerous applications in electrical engineering and 
physical experiment. Perhaps the most important of them is the ac transformer, in which the coils share 
a common soft-ferromagnetic core of the toroidal (“doughnut”) topology — see Fig. 8c.54 As we already 
know from the discussion in Sec. 5.6, such cores, with >> suo, “try” to absorb all magnetic field lines, 
so that the magnetic flux @(f) in the core is nearly the same in each of its cross-sections. With this, Eq. 


(84) yields 


d® d® 
ie are i ere (6.87) 


so that the voltage ratio is completely determined by the ratio N;/N> of the number of wire turns. 


Now we may generalize, to the ac current case, the Kirchhoff laws already discussed in Sec. 4.1 
— see Fig. 4.3 reproduced in Fig. 9a below. Let not only inductances but also capacitances and 
resistances of the wires be negligible in comparison with those of the lumped (compact) circuit 
elements, whose list now would include not only resistors and current sources (as in the de case), but 
also the induction coils (including magnetically coupled ones) and capacitors — see Fig. 9b. In the 
quasistatic approximation, the current flowing in each wire is conserved, so that the “node rule”, i.e. the 
1“ Kirchhoff law (4.7a), 


53 If the resistance is substantial, it may be represented by a separate lumped circuit element (resistor) connected 
in series with the coil. 

54 The first practically acceptable form of this device, called the Stanley transformer, was invented in 1886. In it, 
multi-turn windings could be easily mounted onto a toroidal ferromagnetic (at that time, silicon-steel-plate) core. 
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»1,=0. (6.88) 


remains valid. Also, if the electromagnetic induction effect is restricted to the interior of lumped 
induction coils as discussed above, the voltage drops V; across each circuit element may be still 
represented, just as in de circuits, with differences between the adjacent node potentials. As a result, the 
“loop rule”, i.e. 2" Kirchhoff law (4.7b), 


>, =9, (6.88b) 
k 
is also valid. Now, in contrast to the dc case, Eqs. (88) may be the (ordinary) differential equations. 
However, if all circuit elements are linear (as in the examples presented in Fig. 9b), these equations may 


be readily reduced to linear algebraic equations, using the Fourier expansion. (In the common case of 
sinusoidal ac sources, the final stage of the Fourier series summation is unnecessary.) 


(a) 


(b) 
“circuit 
ag a! 
node” 
real esas 


V=L— V=Rl Ver[idt VaV() 
C 


Fig. 6.9. (a) A typical quasistatic ac circuit obeying the 
Kirchhoff laws, and (b) the simplest lumped circuit 
elements. 


My teaching experience shows that the potential readers of these notes are well familiar with the 
application of Eqs. (88) to such problems from their undergraduate studies, so I will save time/space by 
skipping discussions of even the simplest examples of such circuits, such as LC, LR, RC, and LRC loops 
and periodic structures.5> However, since such problems are very important for practice, my sincere 
advice to the reader is to carry out a self-test by solving a few problems of this type, provided in Sec. 9 
below, and if they cause any difficulty, pursue some remedial reading. 


6.7. Displacement currents 


Electromagnetic induction is not the only new effect arising in non-stationary electrodynamics. 
Indeed, though Eqs. (21) are adequate for the description of quasistatic phenomena, a deeper analysis 
shows that one of these equations, namely V xH = j, cannot be exact. To see that, let us take the 
divergence of its both sides: 


V-(VxH)=V-j. (6.89) 


But, as the divergence of any curl,*° the left-hand side should equal zero. Hence we get 


55 Curiously enough, these effects include wave propagation in periodic LC circuits, even within the quasistatic 
approximation! However, the speed 1/(LC)'” of these waves in lumped circuits is much lower than the speed 
1/(eu1)'” of electromagnetic waves in the surrounding medium - see Sec. 8 below. 

56 Again, see MA Eq. (11.2) — if you need it. 
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V-j=0. (6.90) 


This is fine in statics, but in dynamics, this equation forbids any charge accumulation, because 
according to the continuity relation (4.5), 


vy. (6.91) 
This discrepancy had been recognized by James Clerk Maxwell who suggested, in the 1860s, a 


way out of this contradiction. If we generalize the equation for V x H by adding to the term j (that 
describes the density of real electric currents) the so-called displacement current density term, 


oD 
j,=—. 6.92 
Ja at ( ) 
(which of course vanishes in statics), then the equation takes the form 
Viejo), =e (6.93) 


Ot 
In this case, due to the equation (3.22), V-D = p, the divergence of the right-hand side equals zero due 
to the continuity equation (92), and the discrepancy is removed. This incredible theoretical feat,>’ 
confirmed by the 1886 experiments carried out by Heinrich Hertz (see below) was perhaps the main 
triumph of theoretical physics of the 19" century. 


Maxwell’s displacement current concept, expressed by Eq. (93), is so important that it is 
worthwhile to have one more look at its derivation using a particular model shown in Fig. 10.58 


Fig. 6.10. The Ampére law applied 
to capacitor recharging. 


Neglecting the fringe field effects, we may use Eq. (4.1) to describe the relationship between the 
current J flowing through the wires and the electric charge OQ of the capacitor:59 


dQ 


=e 6.94 
7 (6.94) 


57 It looks deceivingly simple now — after the fact, and with the current mathematical tools (especially the del 
operator), which are much superior to those that were available to J. Maxwell. 

58 No physicist should be ashamed of doing this. For example, J. Maxwell’s main book, 4 Treatise of Electricity 
and Magnetism, is full of drawings of plane capacitors, inductance coils, and voltmeters. More generally, the 
whole history of science teaches us that snobbery regarding particular examples and practical systems is a 
virtually certain path toward producing nothing of either practical value or fundamental importance. 

59 This is of course just the integral form of the continuity equation (91). 
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Now let us consider a closed contour C drawn around the wire. (Solid points in Fig. 10 show the places 
where the contour intercepts the plane of the drawing.) This contour may be seen as the line limiting 
either surface S; (crossed by the wire) or surface S2 (avoiding such crossing by passing through the 
capacitor’s gap). Applying the macroscopic Ampére law (5.116) to the former surface, we get 


pH-dr = | j,d?r=T, (6.95) 
C 5) 


while for the latter surface the same law gives a different result, 


pH-dr = } j,d°r =0, [WRONG!] (6.96) 
C Me 
for the same integral. This is just an integral-form manifestation of the discrepancy outlined above, but it 
shows clearly how serious the problem is (or rather it was — before Maxwell). 


Now let us see how the introduction of the displacement currents saves the day, considering for 
the sake of simplicity a plane capacitor of area A, with a small and constant electrode spacing. In this 
case, as we already know, the field inside it is uniform, with D = o, so that the total capacitor’s charge O 
= Ao= AD, and the current (94) may be represented as 


pag (6.97) 
dt dt 
So, instead of the wrong Eq. (96), the Ampére law modified following Eq. (93), gives 
ID 
fH-dr =[(j,),d°r = [Puar= Paz, (6.98) 
C S, S, Ot dt 


i.e. the Ampére integral becomes independent of the choice of the surface limited by the contour C — as 
it has to be. 


6.8. Finally, the full Maxwell equation system 


: This is a very special moment in this course: with the displacement currents in, 1.e. with the 
replacement of Eq. (5.107) with Eq. (93), we have finally arrived at the full set of macroscopic Maxwell 
equations for time-dependent fields, 

(6.99a) 


(6.99b) 


whose validity has been confirmed by an enormous body of experimental data. Indeed, despite 
numerous efforts, no other corrections (e.g., additional terms) to the Maxwell equations have been ever 
found, and these equations are still considered exact within the range of their validity, i.e. while the 
electric and magnetic fields may be considered classically. Moreover, even in quantum theory, these 


60 This vector form of the Maxwell equations, magnificent in its symmetry and simplicity, was developed in 
1884-85 by Oliver Heaviside, with substantial contributions by H. Lorentz. (The original Maxwell’s result circa 
1864 looked like a system of 20 equations for Cartesian components of the vector and scalar potentials.) 
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equations are believed to be strictly valid as relations between the Heisenberg operators of the electric 
and magnetic fields.®! (Note that the microscopic Maxwell equations for the genuine fields E and B may 
be formally obtained from Eqs. (99) by the substitutions D = gE and H = B/z, and the simultaneous 
replacement of the stand-alone charge and current densities on their right-hand sides with the full ones.) 


Perhaps the most striking feature of these equations is that, even in the absence of stand-alone 
charges and currents inside the region of our interest, when the equations become fully homogeneous, 


Ven. wee! (6.100a) 
at at 


V-D=0, V -B=0, (6.100b) 


they still describe something very non-trivial: electromagnetic waves, including light. The physics of the 
waves may be clearly seen from Eqs. (100a): according to the first of them, the change of the magnetic 
field in time creates a vortex-like (divergence-free) electric field. On the other hand, the second of Eqs. 
(100a) describes how the changing electric field, in turn, creates a vortex-like magnetic field. So- 
coupled electric and magnetic fields may propagate as waves — even very far from their sources. 


We will carry out a detailed quantitative analysis of the waves in the next chapter, and here I will 
only use this notion to make good on the promise given in Sec. 3, namely to establish the condition of 
validity of the quasistatic approximation (21). For simplicity, let us consider an electromagnetic wave 
with a time period 7, velocity v, and hence the wavelength A= v7 ina linear medium with D = cE, B= 
HH. Then the magnitude of the left-hand side of the first of Eqs. (100a) is of the order of E/A = E/v7, 
while that of its right-hand side may be estimated as B/7 ~ wH/7. Using similar estimates for the second 
of Eqs. (100a), we arrive at the following two requirements:®3 


E 1 


—~wr~—. 6.101 
ye (6.101) 
To insure the compatibility of these two relations, the wave’s speed should satisfy the estimate 
v~ aie (6.102) 
(eu) 


reduced to v ~ 1/(& tw)” = c in free space, while the ratio of the electric and magnetic field amplitudes 
should be of the following order: 


E 1 u 1/2 
ele) a) a) oa ae 6.103 
races Hy” G ( ) 


(In the next chapter we will see that for plane electromagnetic waves, these results are exact.) 


Now, let a system of a linear size ~a carry currents producing a certain magnetic field H. Then, 
according to Eqs. (100a), their magnetic field Faraday-induces the electric field of magnitude E ~ 
LHa/7, whose displacement currents, in turn, produce an additional magnetic field with magnitude 


6! See, e.g., QM Chapter 9. 
62 Let me hope the reader knows that the relation A = v7 is universal, valid for waves of any nature — see, e.g., 


CM Chapter 6. (In the case of substantial dispersion, v means the phase velocity.) 
63 The fact that 7 has canceled, shows that these estimates are valid for waves of any frequency. 
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2 
pity ee gy See re a yee (6.104) 
a en v7 


Hence, the displacement current effects are negligible for a system of size a << 1.%4 


In particular, the quasistatic picture of the skin effect, discussed in Sec. 3, is valid while the skin 
depth (33) remains much smaller than the corresponding wavelength, 


2 1/2 
aavr 2B (42) . (6.105) 
a) EU@ 


The wavelength decreases with the frequency as 1/@, ie. faster than & « 1/a'” 


comparable at the crossover frequency 


, so that they become 
fase (6.106) 


which is nothing else than the reciprocal charge relaxation time (4.10). As was discussed in Sec. 4.2, for 
good metals this frequency is extremely high (about 10'* s"), so the validity of Eq. (33) is typically 
limited by the anomalous skin effect (which was briefly discussed in Sec. 3), rather than the wave 
effects. 


Before going after the analysis of the full Maxwell equations for particular situations (that will 
be the main goal of the next chapters of this course), let us have a look at the energy balance they yield 
for a certain volume V, which may include both some charged particles and the electromagnetic field. 
Since, according to Eq. (5.10), the magnetic field performs no work on charged particles even if they 
move, the total power / being transferred from the field to the particles inside the volume is due to the 
electric field alone — see Eq. (4.38): 


P=|p d*r, with p=j-E, (6.107) 
V 
Expressing j from the corresponding Maxwell equation of the system (99), we get 
p= [[ExH)- Dar (6.108) 
V 


Let us pause here for a second, and transform the divergence of ExH, using the well-known vector 
algebra identity:® 
V -(ExH)=H-(VxE)-E-(VxH). (6.109) 


The last term on the right-hand side of this equality is exactly the first term in the square brackets of Eq. 


(108), so we may rewrite that formula as 


6D] 3 
Pp = j[-¥- (Ex H)+H-(VxE)-E- err (6.110) 


64 Let me emphasize that if this condition is not fulfilled, the lumped-circuit representation of the system (see Fig. 
9 and its discussion) is typically inadequate — besides some special cases, to be discussed in the next chapter. 
65 See, e.g., MA Eq. (11.7) with f = E and g = H. 


Chapter 6 Page 30 of 38 


Poynting 
theorem 


Field’s 
energy 
variation 


Field’s 
energy 


Poynting 
vector 


Essential Graduate Physics EM: Classical Electrodynamics 


However, according to the Maxwell equation for V x E, this curl is equal to —OB/ot, so that the second 
term in the square brackets of Eq. (110) equals —H-0B/ot and, according to Eq. (14), is just the (minus) 
time derivative of the magnetic energy per unit volume. Similarly, according to Eq. (3.76), the third term 
under the integral is the (minus) time derivative of the electric energy per unit volume. Finally, we can 
use the divergence theorem to transform the integral of the first term in the square brackets to a 2D 
integral over the surface S limiting the volume V. As a result, we get the so-called Poynting theorem 
for the power balance in the system: 


(6.111) 


Here u is the density of the total (electric plus magnetic) energy of the electromagnetic field, with 


— just the sum of the expressions given by Eqs. (3.76) and (14). For the particular case of an isotropic, 
linear, and dispersion-free medium, with D(¢) = cE(t), B(A) = wH(d, Eq. (112) yields 


(6.113) 


Another key notion participating in Eq. (111) is the Poynting vector, defined as®’ 


The first integral in Eq. (111) is evidently the net change of the energy of the system (particles + field) 
per unit time, so that the second (surface) integral has to be the power flowing out from the system 
through the surface. As a result, it is tempting to interpret the Poynting vector S locally, as the power 
flow density at the given point. In many cases, such a local interpretation of vector S is legitimate; 
however, in other cases, it may lead to wrong conclusions. Indeed, let us consider the simple system 
shown in Fig. 11: a charged plane capacitor placed into a static and uniform external magnetic field, so 
that the electric and magnetic fields are mutually perpendicular. 


Fig. 6.11. The Poynting vector paradox. 


In this static situation, with no charges moving, both # and 0/ot are equal to zero, and there 
should be no power flow in the system. However, Eq. (114) shows that the Poynting vector is not equal 


66 It is named after John Henry Poynting for his work published in 1884, though this fact was independently 
discovered by O. Heaviside in 1885 in a simpler form, while a similar result for the intensity of mechanical elastic 
waves had been obtained earlier (in 1874) by Nikolay Alekseevich Umov — see, e.g., CM Sec. 7.7. 
67 Actually, an addition to S of the curl of an arbitrary vector function f(r, 4) does not change Eq. (111). Indeed, 
we may use the divergence theorem to transform the corresponding change of the surface integral in Eq. (111) toa 
volume integral of scalar function V-(V~xf) that equals zero at any point — see, e.g., MA Eq. (11.2). 
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to zero inside the capacitor, being directed as the red arrows in Fig. 11 show. From the point of view of 
the only unambiguous corollary of the Maxwell equations, Eq. (111), there is no contradiction here, 
because the fluxes of the vector S through the side boundaries of the volume shaded in Fig. 11 are equal 
and opposite (and they are zero for other faces of this rectilinear volume), so that the total flux of the 
Poynting vector through the volume boundary equals zero, as it should. It is, however, useful to recall 
this example each time before giving a local interpretation of the vector S. 


The paradox illustrated in Fig. 11 is closely related to the radiation recoil effects, due to the 
electromagnetic field’s momentum — more exactly, it /inear momentum. Indeed, acting as at the 
Poynting theorem derivation, it is straightforward to use the microscopic Maxwell equations®’ to prove 
that, neglecting the boundary effects, the vector sum of the mechanical linear momentum of the particles 
in an arbitrary volume, and the integral of the following vector, 

ge, (6.115) 

c 

over the same volume, is conserved, enabling an interpretation of g as the density of the linear 
momentum of the electromagnetic field. (It will be more convenient for me to prove this relation, and 
discuss the related issues, in Sec. 9.8, using the 4-vector formalism of the special relativity.) Due to this 
conservation, if some static fields coupled to mechanical bodies are suddenly decoupled from them and 
are allowed to propagate in space, i.e. to change their local integral of g, they give the bodies an equal 
and opposite impulse of force. 


Finally, to complete our initial discussion of the Maxwell equations, let us rewrite them in terms of 
potentials A and @, because this is more convenient for the solution of some (though not all!) problems. 
Even when dealing with the system (99) of the more general Maxwell equations than discussed before, 
Eqs. (7) are still used for the definition of the potentials. It is straightforward to verify that with these 
definitions, the two homogeneous Maxwell equations (99b) are satisfied automatically. Plugging Eqs. 
(7) into the inhomogeneous equations (99a), and considering, for simplicity, a linear, uniform medium 
with frequency-independent ¢ and yw, we get 

eS, p : 0°A 
V o+—(V A) VA EL - 


-v[v-A seu) 14 (6.116) 

This is a more complex result than what we would like to get. However, let us select a special 
gauge, which is frequently called (especially for the free space case, when v = c) the Lorenz gauge 
condition” 


VAs oul =o, (6.117) 


68 The situation with the macroscopic Maxwell equations is more complex, and is still a subject of some lingering 
discussions (usually called the Abraham-Minkowski controversy, despite contributions by many other scientists 
including A. Einstein), because of the ambiguity of the momentum’s division between its field and particle 
components — see, e.g., the review paper by R. Pfeiffer et al., Rev. Mod. Phys. 79, 1197 (2007). 

69 We will return to their general discussion (in particular, to the analytical mechanics of the electromagnetic 
field, and its stress tensor) in Sec. 9.8, after we have got equipped with the special relativity theory. 

70 This condition, named after Ludwig Lorenz, should not be confused with the so-called Lorentz invariance 
condition of relativity, due to Hendrik Lorentz, to be discussed in Sec. 9.4. (Note the last names’ spelling.) 
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which is a natural generalization of the Coulomb gauge (5.48) to time-dependent phenomena. With this 
condition, Eqs. (107) are reduced to a simpler, beautifully symmetric form: 


(6.118) 


where v’ = 1/eu. Note that these equations are essentially a set of 4 similar equations for 4 scalar 
functions (namely, ¢ and three Cartesian components of A) and thus clearly invite the 4-component 
vector formalism of the relativity theory; it will be discussed in Chapter 9.7! 


If gand A depend on just one spatial coordinate, say z, then in a region without field sources: p 
=0,j=0, Eqs. (118) are reduced to the following 1D wave equations 


2 2 2 2 
tea is fa G: (6.119) 


C?z wv Or 


Oe vw oF 


> 


It is well known” that these equations describe waves, with arbitrary waveforms (including sinusoidal 
waves of any frequency), propagating with the same speed v in either of the z-axis directions. 
According to the definitions of the constants & and su, in free space, v is just the speed of light: 


Goa a ee, (6.120) 


(EoMo) 


Historically, the experimental observation of relatively low-frequency (GHz-scale) electromagnetic 
waves, with their speed equal to that of light, was the decisive proof (actually, a real triumph!) of the 
Maxwell theory and his prediction of such waves.73 This was first accomplished in 1886 by Heinrich 
Rudolf Hertz, using the electronic circuits and antennas he had invented for this purpose. 


Before proceeding to the detailed analysis of these waves in the following chapters, let me 
mention that the invariance of Eqs. (119) with respect to the wave propagation direction is not 
occasional; it is just a manifestation of one more general property of the Maxwell equations (99), called 
the Lorentz reciprocity. We have already met its simplest example, for time-independent electrostatic 
fields, in one of the problems of Chapter 1. In a much more general case when two monochromatic 
electromagnetic fields of the same frequency, with complex amplitudes, say, {E,(r), Hi(r)} and {E2(r), 


7! Here I have to mention in passing the so-called Hertz vector potentials TI, and Tm (whose introduction may be 
traced back at least to the 1904 work by E. Whittaker). They may be defined by the following relations: 

OG ee es ae ¢--lv-m,, 

ot E 

which make the Lorentz gauge condition (117) automatically satisfied. These potentials are especially convenient 
for the solution of problems in which the electromagnetic field is induced by sources characterized by field- 
independent electric and magnetic polarizations P and M — rather than by field-independent charge and current 
densities p and j. Indeed, it is straightforward to check that both II, and II, satisfy the equations similar to Eqs. 
(118), but with their right-hand sides equal to, respectively, —P and —M. Unfortunately, I would not have 
time/space to discuss such problems and have to refer interested readers elsewhere — for example, to a classical 
text by J. Stratton, Electromagnetic Theory, Adams Press, 2008. 
72 See, e.g., CM Secs. 6.3-6.4 and 7.7-7.8. 
73 By that time, the speed of light (estimated very reasonably by Ole Romer as early as 1676) has been 
experimentally measured, by Hippolyte Fizeau and then Léon Foucault, with an accuracy better than 1%. 
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H,(r)} are induced, separately, by stand-alone currents with complex amplitudes j,(1r) and j2(r) of their 
densities. Then it may be proved” that if the medium is linear and either isotropic or even anisotropic 
but with symmetric tensors and j4;', then for any volume V limited by a closed surface S, 


[Gi -E. -j,-E,)d’r = $(E, xH, -E, xH,), d?r. (6.121) 
V S 

This property implies, in particular, that the waves propagate similarly in two reciprocal 
directions even in situations much more general than the 1D case described by Eqs. (119). For some 
important practical applications (e.g., for low-noise amplifiers and detectors) such reciprocity is rather 
inconvenient. Fortunately, Eq. (121) may be violated in anisotropic media with asymmetric tensors 6; 
and/or 44. The simplest case of such an anisotropy, the Faraday rotation of the wave polarization in 
plasma, will be discussed in the next chapter. 


6.9. Exercise problems 


6.1. Prove that the electromagnetic induction e.m.f. Wing in a conducting loop may be measured 
as shown on two panels of Fig. 1: 

(1) by measuring the current J = Ving/R induced in the loop closed with an Ohmic resistor R, or 

(11) using a voltmeter inserted into the loop. 


6.2. The flux ® of the magnetic field that pierces a resistive ring V=? 
is being changed in time, while the field outside of the ring is negligibly @: 
low. A voltmeter is connected to a part of the ring, as shown in the J) 


figure on the right. What would the voltmeter show? 


6.3. A weak, uniform, time-independent magnetic field B is applied to an axially-symmetric 
permanent magnet with the dipole magnetic moment m directed along its axis, rapidly rotating about the 
same axis, with an angular momentum L. Calculate the electric field resulting from the magnetic field’s 
application, and formulate the conditions of your result’s validity. 


6.4. The similarity of Eq. (5.53) obtained in Sec. 5.3 without any use of the Faraday induction 
law, and Eq. (5.54) proved in Sec. 2 of this chapter using the law, implies that the law may be derived 
from magnetostatics. Prove that this is indeed true for a particular case of a current loop, being slowly 
deformed in a fixed magnetic field B. 


6.5. Could Problem 5.2 (i.e. the analysis of the mechanical stability of 
the system shown in the figure on the right) be solved using potential energy FS 


arguments? I, 


74 Tt will be more convenient for me to give this proof (or rather offer it for the reader’s exercise :-) in the next 
chapter, after we have discussed the Fourier expansion of the fields in linear media. 
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6.6. Use energy arguments to calculate the pressure exerted by the magnetic field B inside a long 
uniform solenoid of length /, and a cross-section of area A << P, with N >> 1/A'? >> 1 turns, on its 
“walls” (windings), and the forces exerted by the field on the solenoid’s ends, for two cases: 


(i) the current through the solenoid is fixed by an external source, and 
(11) after the initial current setting, the ends of the solenoid’s wire, with negligible resistance, are 
connected, so that it continues to carry a non-zero current. 


Compare the results, and give a physical interpretation of the direction of these forces. 


6.7. The electromagnetic railgunisa V I l 


projectile launch system consisting of two F,v 
long, parallel conducting rails and a sliding OEE (a) 
conducting projectile, shorting the current / a a 

fed into the system by a powerful source — Ht (b) 
see panel (a) in the figure on the right. Ey 

Calculate the force exerted on the projectile, wy. 


using two approaches: 


(i) by a direct calculation, assuming that the cross-section of the system has the simple shape 
shown on panel (b) of the figure above, with t << w, /, and 
(ii) using the energy balance (for simplicity, neglecting the Ohmic resistances in the system), 


and compare the results. 


6.8. A uniform, static magnetic field B is applied along the axis of a 
long round pipe of a radius R and a very small thickness z, made of a 
material with Ohmic conductivity o. A sphere of mass M and radius R’ << 
R, made of a linear magnetic material with permeability uw >> so, is 
launched, with an initial velocity vo, to fly ballistically along the pipe’s axis 
— see the figure on the right. Use the quasistatic approximation to calculate 
the distance the sphere would pass before it stops. Formulate the conditions of validity of your result. 


6.9. A plane thin-wire loop with inductance L, resistance R, and area A is launched to fly 
ballistically into a region where the magnetic field B is constant. Calculate the final change of the 
kinetic energy of the loop, assuming that the time of its entry into the field region is much shorter than 
the relaxation time constant L/R, and that the loop cannot rotate. 


6.10. AC current of frequency @ is being passed through a long uniform wire with a round cross- 
section of radius R comparable with the skin depth 6. In the quasistatic approximation, find the 
current’s distribution across the cross-section, and analyze it in the limits R << 6, and 6, << R. Calculate 
the effective ac resistance of the wire (per unit length) in these two limits. 


6.11. A very long, round cylinder of radius R, made of a uniform conductor with an Ohmic 
conductivity o and magnetic permeability 4, has been placed into a uniform ac magnetic field Hex(¢) = 
Hocosat, directed along its symmetry axis. Calculate the spatial distribution of the magnetic field’s 
amplitude, and in particular its value on the cylinder’s axis. Spell out the last result in the limits of 
relatively small and large R. 
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6.12.” Define and calculate an appropriate spatial-temporal Green’s function for Eq. (25), and 
then use this function to analyze the dynamics of propagation of the external magnetic field that is 
suddenly turned on at ¢ = 0 and then kept constant: 


Als-<08)=4 


into an Ohmic conductor occupying the semi-space x > 0 — see Fig. 2. 


0, at ¢ <0, 
fs. at ¢>0, 


Hint: Try to use a function proportional to exp{(x—x’)’/2(dc)}, with a suitable time 
dependence of the parameter dx, and a properly selected pre-exponential factor. 


6.13. Solve the previous problem using the variable separation method, and compare the results. 


6.14. Calculate the average force exerted by ac current /(t) of zok 
amplitude Jp, flowing in a planar round coil of radius R, on a conducting 
sphere with a much smaller radius R’ (which is still much larger than the 


skin depth 6, at the ac current’s frequency), located on the loop’s axis, at 
distance z from its center — see the figure on the right. CRD 
I(t) 

6.15. A small, planar wire loop carrying current J, is located far from 


a plane surface of a superconductor. Within the coarse-grain (ideal-diamagnetic) description of the 
Meissner-Ochsenfeld effect, calculate: 


(i) the energy of the loop-superconductor interaction, 
(11) the force and torque acting on the loop, and 
(111) the distribution of supercurrents on the superconductor surface. 


6.16. A straight, uniform magnet of length 2/, cross-section area A I 
<< F, and mass m, with a permanent longitudinal magnetization Mo, is __ 
placed over a horizontal surface of a superconductor — see the figure on the Z| 
right. Within the ideal-diamagnet description of the Meissner-Ochsenfeld 8 
effect, find the stable equilibrium position of the magnet. KK —’'6=Bqw 


6.17. A plane superconducting wire loop of area A and 
inductance L may rotate, without friction, about a horizontal axis 0 (in __ re re 
the figure on the right, normal to the plane of the drawing) passing OQ 
through its center of mass. Initially, the loop had been horizontal (with B 
@= 0), and carried supercurrent Jp in such a direction that its magnetic 
dipole vector had been directed down. Then a uniform magnetic field B, 
directed vertically up, was applied. Using the ideal-diamagnet description of the Meissner-Ochsenfeld 


effect, find all possible equilibrium positions of the loop, analyze their stability, and give a physical 
interpretation of the results. 


6.18. Use the London equation to analyze the penetration of a uniform external magnetic field 
into a thin (¢ ~ 6.) planar superconducting film, whose plane is parallel to the field. 
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6.19. Use the London equation to calculate the distribution of supercurrent density j inside a 
long, straight superconducting wire, with a circular cross-section of radius R ~ oy, carrying dc current J. 


6.20. Use the London equation to calculate the ev >> tO, S 
inductance (per unit length) of a long, uniform superconducting 4) Fe 
: eee V L 
strip placed close to the surface of a similar superconductor — ¢~ 6, -~- 
see the figure on the right, which shows the structure’s cross- RW RS dMVA 
section. 


6.21. Calculate the inductance (per unit length) of a superconducting 
cable with the round cross-section shown in the figure on the right, in the 
following limits: 


(1) O. <<a, b, c— b, and 
(11) a<< 6, <<b,c—b. 


6.22. Use the London equation to analyze the magnetic field shielding by a superconducting thin 
film of thickness t << 6, by calculating the penetration of the field induced by current / in a thin wire 
that runs parallel to a wide planar thin film, at a distance d >> t¢ from it, into the space behind the film. 


6.23. Use the Ginzburg-Landau equations (54) and (63) to calculate the largest (“critical”) value 
of supercurrent in a uniform, long superconducting wire of a small cross-section Ay << 6’. 


6.24. Use the discussion of a long, straight Abrikosov vortex, in the limit € << 6,, in Sec. 5 to 
prove Eqs. (71)-(72) for its energy per unit length, and the first critical field. 


6.25. Use the Ginzburg-Landau equations (54) and (63) to prove the Josephson relation (76) for 
a small superconducting weak link, and express its critical current /, via the Ohmic resistance R, of the 
same weak link in its normal state. 


6.26. Use Eqs. (76) and (79) to calculate the coupling energy of a Josephson junction and the full 
potential energy of the SQUID shown in Fig. 4c. 


6.27. Analyze the possibility of wave propagation in a long, L L L 


uniform chain of lumped inductances and capacitances — see the figure Ng A NL 
aie. 2G C oe 


on the right. ae ze 


Hint: Readers without prior experience with electromagnetic 
wave analysis may like to use a substantial analogy between this effect and mechanical waves in a 1D 
chain of elastically coupled particles.’ 


6.28. A sinusoidal e.m.f. of amplitude Vp and frequency @ R R R 


is applied to an end of a long chain of similar lumped resistors 
and capacitors, shown in the figure on the right. Calculate the / (¢ ) C C ce 


law of decay of the ac voltage amplitude along the chain. 


75 See, e.g., CM Sec. 6.3. 
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6.29. As was discussed in Sec. 7, the displacement current concept allows one to generalize the 
Ampere law to time-dependent processes as 


0 
fH-dr=1,+—[D,d?r. 
Cc Ot S 
We also have seen that such generalization makes the integral JH- dr over an external contour, such as 


the one shown in Fig. 10, independent of the choice of the surface S limited by the +0 -O 
contour. However, it may look like the situation is different for a contour drawn 


inside the capacitor — see the figure on the right. Indeed, if the contour’s size is _ 

much larger than the capacitor’s thickness, the magnetic field H created by the het a 
linear current J on the contour’s line is virtually the same as that of a continuous 

wire, and hence the integral JH- dr along the contour apparently does not depend Si 

on its area, while the magnetic flux JD,d’r does, so that the equation displayed : 

above seems invalid. (The current J; piercing this contour evidently equals zero.) es 
Resolve the paradox, for simplicity considering an axially-symmetric system. d 


6.30. A straight, uniform, long wire with a circular cross-section of radius R is made of an Ohmic 
conductor with conductivity o, and carries de current 7. Calculate the flux of the Poynting vector 
through its surface, and compare it with the Joule rate of energy dissipation. 
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Chapter 7. Electromagnetic Wave Propagation 


This (rather extensive) chapter focuses on the most important effect that follows from the time- 
dependent Maxwell equations, namely the electromagnetic waves, at this stage avoiding the issue of 
their origin, i.e. of the wave radiation process — which will be the subject of Chapters 8 and 10. We will 
start from the simplest, plane waves in uniform and isotropic media, and then proceed to a discussion of 
nonuniform systems, bringing up such effects as reflection and refraction. Then we will discuss the so- 
called guided waves, propagating along various transmission lines — such as cables, waveguides, and 
optical fibers. Finally, the end of the chapter is devoted to final-length fragments of such lines, serving 
as resonant cavities, and to the effects of energy dissipation in transmission lines and cavities.. 


7.1. Plane waves 


Let us start by considering a spatial region that does not contain field sources (p = 0, j = 0), and 
is filled with a linear, uniform, isotropic medium, which obeys Eqs. (3.46) and (5.110): 


D=c, B=,H. (7.1) 


Moreover, let us assume for a while that these constitutive equations hold for all frequencies of interest. 
(Of course, these relations are exactly valid for the very important particular case of free space, where 
we may formally use the macroscopic Maxwell equations (6.100), but with ¢= & and “= su.) As was 
already shown in Sec. 6.8, in this case, the Lorenz gauge condition (6.117) allows the Maxwell 
equations to be recast into the wave equations (6.118) for the scalar and vector potentials. However, for 
most purposes, it is more convenient to use the homogeneous Maxwell equations (6.100) for the electric 
and magnetic fields — which are independent of the gauge choice. After an elementary elimination of D 
and B using Eqs. (1),! these equations take a simple, very symmetric form: 


(7.2a) 


(7.2b) 


Now, acting by operator Vx on each of Eqs. (2a), i.e. taking their curl, and then using the vector algebra 
identity (5.31), whose first term, for both E and H, vanishes due to Eqs. (2b), we get fully similar wave 
equations for the electric and magnetic fields:2 


(7.3) 


where the parameter v is defined as 


! Though B rather than H is the actual magnetic field, mathematically it is a bit more convenient (just as it was in 
Sec. 6.2) to use the vector pair {E, H} in the following discussion, because at sharp media boundaries, it is H that 
obeys the boundary condition (5.117) similar to that for E — cf. Eq. (3.37). 

2 The two vector equations (3) are of course just a shorthand for six similar equations for three Cartesian 
components of E and H, and hence for their magnitudes F and H. 
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vs. (7.4) 
Ell 


with v’ = 1/é =C’ in free space — see Eq. (6.120) again. 


These equations allow, in particular, solutions of the following type; 


where z is the Cartesian coordinate along a certain (arbitrary) direction n, and fis an arbitrary function 
of one argument. Note that this solution, first of all, describes a traveling wave — meaning a certain field 
pattern moving, without deformation, along the z-axis, with the constant velocity v. Second, according 
to Eq. (5), both E and H have the same values at all points of each plane perpendicular to the direction n 
=nz_, of the wave propagation; hence the second name — plane wave. 


According to Eqs. (2), the independence of the wave equations (3) for vectors E and H does not 
mean that their plane-wave solutions are independent. Indeed, plugging any solution of the type (5) into 
Eqs. (2a), we get 

H-"—, ie. E=ZHxn, (7.6) 
where 


(7.7) 


The vector relationship (6) means, first of all, that at any point of space and at any time instant, 
the vectors E and H are perpendicular not only to the propagation vector n (such waves are called 
transverse) but also to each other — see Fig. 1. 


Fig. 7.1. Field vectors in a plane electromagnetic 
wave propagating along direction n. 


Second, this equality does not depend on the function f, meaning that the electric and magnetic 
fields increase and decrease simultaneously. Finally, the field magnitudes are related by the constant Z 
called the wave impedance of the medium. Very soon we will see that this impedance plays a pivotal 
role in many problems, in particular at the wave reflection from the interface between two media. Since 
the dimensionality of E, in SI units, is V/m, and that of H is A/m, Eq. (7) shows that Z has the 
dimensionality of V/A, i.e. ohms (Q).? In particular, in free space, 


3 In the Gaussian units, E and H have a similar dimensionality (in particular, in a free-space wave, E = H), making 
the (very useful) notion of the wave impedance less manifestly exposed — so that in some older physics textbooks 
it is not mentioned at all! 
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1/2 
ts) =47x107c =377Q. (7.8) 


Next, plugging Eq. (6) into Eqs. (6.113) and (6.114), we get: 


2 
$= ExH =n—=n2H’, (7.9b) 


so that, according to Eqs. (4) and (7), the wave’s energy and power densities are universally related as 
S=nuwy. (7.9c) 


In view of the Poynting vector paradox discussed in Sec. 6.8 (see Fig. 6.11), one may wonder 
whether the last equality may be interpreted as the actual density of power flow. In contrast to the static 
situation shown in Fig. 6.11, which limits the electric and magnetic fields to the vicinity of their sources, 
waves may travel far from them. As a result, they can form wave packets of a finite length in free space 
— see Fig. 2. 


wave packet 


Fig. 7.2. Interpreting the Poynting 
vector in a plane electromagnetic 
wave. (Horizontal lines show 
equal-field planes.) 


Let us apply the Poynting theorem (6.111) to the cylinder shown with dashed lines in Fig. 2, with 
one lid inside the wave packet, and another lid in the region already passed by the wave. Then, 
according to Eq. (6.111), the rate of change of the full field energy & inside the volume is dé /dt = —SA 
(where A is the lid area), so that S may be indeed interpreted as the power flow (per unit area) from the 
volume. Making a reasonable assumption that the finite length of a sufficiently long wave packet does 
not affect the physics inside it, we may indeed interpret the S given by Eqs. (9b-c) as the power flow 
density inside a plane electromagnetic wave. 


As we will see later in this chapter, the free-space value Zp of the wave impedance, given by Eq. 
(8), establishes the scale of Z of virtually all wave transmission lines, so we may use it, together with 
Eq. (9), to get a better feeling of how much different are the electric and magnetic field amplitudes in the 
waves — on the scale of typical electrostatics and magnetostatics experiments. For example, according to 
Eqs. (9), a wave of a modest intensity § = 1 W/m’ (this is what we get from a usual electric bulb a few 
meters away from it) has E ~ (SZo)'* ~ 20 V/m, quite comparable with the de field created by a standard 
AA battery right outside it. On the other hand, the wave’s magnetic field H = (S/Zo)'”” = 0.05 A/m. For 
this particular case, the relation following from Eqs. (1), (4), and (7), 
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E 
B= = — 
MAL poe 


a 2) r=". (7.10) 
(u jay? 7. 

gives B = poH = Elc ~ 7x10°T, i.e. a magnetic field a thousand times lower than the Earth’s field, and 
about 7 orders of magnitude lower than the field of a typical permanent magnet. This huge difference 
may be interpreted as follows: the scale B ~ E/c of magnetic fields in the waves is “normal” for 
electromagnetism, while the permanent magnet fields are abnormally high because they are due to the 
ferromagnetic alignment of electron spins, essentially relativistic objects — see the discussion in Sec. 5.5. 


The fact that Eq. (5) is valid for an arbitrary function f means, in the standard terminology, that a 
medium with frequency-independent ¢ and w supports the propagation of plane waves without either 
decay (attenuation) or waveform deformation (dispersion). However, for any real medium but pure 
vacuum, this approximation is valid only within limited frequency intervals. We will discuss the effects 
of attenuation and dispersion in the next section and will see that all our prior formulas remain valid 
even for an arbitrary linear media, provided that we limit them to single-frequency (i.e. sinusoidal, 
frequently called monochromatic) waves. Such waves may be most conveniently represented as* 


fa Re fe - a (7.11) 


where f, is the complex amplitude of the wave, and k is its wave number (the magnitude of the wave 
vector k = nk), sometimes called the spatial frequency. The last term is justified by the fact, evident 
from Eq. (11), that & is related to the wavelength 4 exactly as the usual (“temporal”) frequency @ is 
related to the time period 7 
20 20 
— =—. TAZ 
7 (7.12) 
In the dispersion-free case (5), the compatibility of that relation with Eq. (11) requires the argument (Az 
— ot) =k [z — (a@/k)t] to be proportional to (z — vf), so that w/k = v, 1.e. 


k= =(en) 0, (7.13) 
Vv 


so that in that particular case, the dispersion relation ak) is linear. 


Now note that Eq. (6) does not mean that the vectors E and H retain their direction in space. 
(The wave in that they do is called /inearly polarized.5) Indeed, nothing in the Maxwell equations 
prevents, for example, a joint rotation of this vector pair around the fixed vector n, while still keeping all 
these three vectors perpendicular to each other at any instant — see Fig. 1. However, an arbitrary rotation 
law or even an arbitrary constant frequency of such rotation would violate the single-frequency 
(monochromatic) character of the elementary sinusoidal wave (11). To understand what is the most 
general type of polarization the wave may have without violating that condition, let us represent two 


4 As we have already seen in the previous chapter (see also CM Sec. 1), such complex-exponential representation 
of sinusoidally-changing variables is more convenient for mathematical manipulation with than using sine and 
cosine functions, especially because in all linear relations, the operator Re may be omitted (implied) until the very 
end of the calculation. Note, however, that this is not valid for the quadratic forms such as Eqs. (9). 

5 The possibility of different polarizations of electromagnetic waves was discovered (for light) in 1699 by Rasmus 
Bartholin, a.k.a. Erasmus Bartholinus. 
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Cartesian components of one of these vectors (say, E) along any two fixed axes x and y, perpendicular to 
each other and the z-axis (1.e. to the vector n), in the same form as used in Eq. (11): 


E,= Ref Enel all E, = Re £,,e4 - a] ; (7.14) 


To keep the wave monochromatic, the complex amplitudes E,, and E, have to be constant in time; 
however, they may have different magnitudes and an arbitrary phase shift between them. 
In the simplest case when the arguments of these complex amplitudes are equal, 


t= |z el? (7.15) 


Ox,y 
the real field components have the same phase: 


E,=|Ex, 


cos(kz—a@t+9), E,= IE 


cos(kz — at + ¢), (7.16) 


Oy 


so that their ratio is constant in time — see Fig. 3a. This means that the wave is linearly polarized, with 
the polarization plane defined by the relation 


tan =|E,,|/|Ea. 


(7.17) 


Fig. 7.3. Time evolution of the instantaneous electric field vector in monochromatic waves with: 
(a) a linear polarization, (b) a circular polarization, and (c) an elliptical polarization. 


Another simple case is when the moduli of the complex amplitudes E,, and Ey are equal, but 
their phases are shifted by +7/2 or —7/2: 


el, |g jel **/2) (7.18) 


In this case 


E£.= 


E. 


cos(kz — at + 9), B, =[B,od ke c+ pz) = 36, |sin(te a +9). (7.19) 


This means that on the wave’s plane (normal to n), the end of the vector E moves, with the wave’s 
frequency @, either clockwise or counterclockwise around a circle — see Fig. 3b: 


O(t) =F(at-¢). (7.20) 


Chapter 7 Page 5 of 68 


Essential Graduate Physics EM: Classical Electrodynamics 


Such waves are called circularly polarized. In the dominant convention, the wave is called right- 
polarized (RP) if it is described by the lower sign in Eqs. (18)-(20), i.e. if the vector m of the angular 
frequency of the field vector’s rotation coincides with the wave propagation’s direction n, and /eft- 
polarized (LP) in the opposite case. These particular solutions of the Maxwell equations are very 
convenient for quantum electrodynamics, because single electromagnetic field quanta with a certain 
(positive or negative) spin direction may be considered as elementary excitations of the corresponding 
circularly-polarized wave.® (This fact does not exclude, from the quantization scheme, waves of other 
polarizations, because any monochromatic wave may be presented as a linear combination of two 
opposite circularly-polarized waves — just as Eqs. (14) represent it as a linear combination of two 
linearly-polarized waves.) 


Finally, in the general case of arbitrary complex amplitudes F,,, and E,, the field vector’s end 
moves along an ellipse (Fig. 3c); such wave is called elliptically polarized. The elongation 
(“eccentricity”) and orientation of the ellipse are completely described by one complex number, the ratio 
E/E, 1.€. by two real numbers, for example, |E,,,/E,,| and @ = arg(Eo;/Eoy).” 


7.2. Attenuation and dispersion 


Let me start the discussion of the dispersion and attenuation effects by considering a particular 
case of the time evolution of the electric polarization P(¢) of a dilute, non-polar medium, with negligible 
interaction between its elementary dipoles p(t). As was discussed in Sec. 3.3, in this case, the local 
electric field acting on each elementary dipole, may be taken equal to the macroscopic field E(#). Then, 
the dipole moment p(t) may be caused not only by the values of the field E at the same moment of time 
(t), but also by those at the earlier moments ¢’ < ¢. Due to the linear superposition principle, the 
macroscopic polarization P(t) = np() should be a sum (practically, an integral) of the values of E(t’) at 
all moments ¢’ < ¢, weighed by some function of ¢ and ¢’: 8 


P(t) = [ewnaurndr (7.21) 


6 This issue is closely related to that of the wave’s angular momentum; it will be more convenient for me to 
discuss it later in this chapter (in Sec. 7). 

7 Note that the same information may be expressed via four so-called Stokes parameters So, 51, 82, and s3, which 
are popular in practical optics, because they may be used for the description of not only completely coherent 
waves that are discussed here, but also of party coherent or even fully incoherent waves — including the natural 
light emitted by thermal sources such as our Sun. (In contrast to the coherent waves (14), whose complex 
amplitudes are deterministic numbers, the amplitudes of incoherent waves should be treated as random variables.) 
For more on the Stokes parameters, as well as many other optics topics I will not have time to cover, I can 
recommend the classical text by M. Born et al., Principles of Optics, 7" ed., Cambridge U. Press, 1999. 

8 In an isotropic media, the vectors E, P, and hence D = &E + P, are all parallel, and for notation simplicity, I will 
drop the vector sign in the following formulas of this section. I am also assuming that P at any point r is only 
dependent on the electric field at the same point, and hence drop the factor exp {ikz}, the same for all variables. 
This last assumption is valid if the wavelength 2 is much larger than the elementary dipole’s size a. In most 
systems of interest, the scale of a is atomic (~10"°m), so that this approximation is valid up to extremely high 
frequencies, @~ c/a ~ 10'* s'', corresponding to hard X-rays. 
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The condition t’ < t, which is implied by this relation, expresses a keystone principle of all 
science, the causal relation between a cause (in our case, the electric field E(t’) applied to each dipole) 
and its effect (the polarization P(t) it creates). The function G(¢, t’) is called the temporal Green’s 
function for the electric polarization.? To reveal its physical sense, let us consider the case when the 
applied field E(¢) is a very short pulse at the moment fo < ¢, which may be well approximated with 
Dirac’s delta function: 

E(t) = 0(t-t"). (22) 


Then Eq. (21) yields just P(t) = G(t, t’’), so that the Green’s function G(t, t’) is just the polarization at 
moment f, created by a unit 6-functional pulse of the applied field at moment t’ (Fig. 4). 


E(t) , EO) oe—e) 
P(t) : 
r Pt) = G(t,t') Fig. 7.4. An example of the temporal 
0 ; Green’s function for the electric 


polarization (schematically). 


What are the general properties of the temporal Green’s function? First, for systems without 
infinite internal “memory”, G should tend to zero at t— t’ + , although the type of this approach (e.g., 
whether the function G oscillates approaching zero, as in Fig. 4, or not) depends on the medium’s 
properties. Second, if parameters of the medium do not change in time, the polarization response to an 
electric field pulse should be dependent not on its absolute timing, but only on the time difference 0 =¢ 
— t’ between the pulse and observation instants, when Eq. (21) is reduced to 


P(t)= } E(t')G(t-t')dt' = ( E(t-0)G(0)d0. (7.23) 
—00 0 
For a sinusoidal waveform, E(t) = Re [E,e"], this equation yields 
P(t)=RefE,e '° G(0)d0 =Re [-. [a@ele? ‘o | (7.24) 
0 0 


The expression in the last parentheses is of course nothing else than the complex amplitude P,, of the 
polarization. This means that though even if the static linear relation (3.43), P = yeéE, is invalid for an 
arbitrary time-dependent process, we may still keep its Fourier analog, 


P,=y,(@)e,E,, with aCe | G(o)e°F de, (7.25) 
eK 
for each sinusoidal component of the process, using it as the definition of the frequency-dependent 


electric susceptibility v.(@). Similarly, the frequency-dependent electric permittivity may be defined 
using the Fourier analog of Eq. (3.46): 


° The idea of these functions is very similar to that of the spatial Green’s functions (see Sec. 2.10), but with a new 
twist, due to the causality principle. A discussion of the temporal Green’s functions in application to classical 
mechanics (which to some extent overlaps with our current discussion) may be found in CM Sec. 5.1. 
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D,, = é(a)E (7.26a) 


o° 


Then, according to Eq. (3.47), the complex permittivity is related to the temporal Green’s function by 
the usual Fourier transform: 


(a) 


(0) = €) ~ =6,+] Gaye? de. (7.26b) 
0 
This relation shows that the function «&(@) may be complex, 
é(@) = €(@)+ie"(@), with é'(o) =&, + [G@) cos@O0 dd, é"(a)= [G@) sin@Od@, (7.27) 
0 0 


and that its real part €¢’(@) is always an even function of frequency, while the imaginary part ¢”(@) is an 
odd function of @. Note that though the particular causal relationship (21) between P(t) and E(f) is 
conditioned by the elementary dipole independence, the frequency-dependent complex electric 
permittivity «(@) may be introduced, in a similar way, if any two linear combinations of these variables 
are related by a similar formula. 

Absolutely similar arguments show that magnetic properties of a linear, isotropic medium may 
be characterized by a frequency-dependent, complex permeability 4(@). Now rewriting Eqs. (1) for the 
complex amplitudes of the fields at a particular frequency, we may readily repeat all calculations of Sec. 
1, and verify that all its results are valid for monochromatic waves even for a dispersive (but necessarily 
linear!) medium. In particular, Eqs. (7) and (13) now become 


Z() = [x2] ko) =ofe(o)(o}'”, (7.28) 


so that the wave impedance and the wave number may be both complex functions of frequency. !° 


This fact has important consequences for electromagnetic wave propagation. First, plugging the 
representation of the complex wave number as the sum of its real and imaginary parts, k(@) = k’(@) + 


ik”(@), into Eq. (11): 
f= Relfeil@de-anl oO Re {fe [k'(a)z—a] i} (7.29) 


we see that k’”(@) describes the rate of wave attenuation in the medium at frequency @.'! Second, if the 
waveform is not sinusoidal (and hence should be represented as a sum of several/many sinusoidal 
components), the frequency dependence of k’(@) provides for wave dispersion, 1.e. the waveform 
deformation at the propagation, because the propagation velocity (4) of component waves is now 
different. !2 


10 The first unambiguous observations of dispersion (for the case of light refraction) were described by Sir Isaac 
Newton in his Optics (1704) — even though this genius has never recognized the wave nature of light! 

'1 Tt may be tempting to attribute this effect to wave absorption, i.e. the dissipation of the wave’s energy, but we 
will see very soon that wave attenuation may be due to different effects as well. 

12 The reader is probably familiar with the most noticeable effect of the dispersion: the difference between the 
group velocity Vg. = da/dk’ giving the speed of the envelope of a wave packet with a narrow frequency spectrum, 
and the phase velocity Vp, = a/k’ of the component waves. The second-order dispersion effect, proportional to 
d ald’k’, leads to the deformation (gradual broadening) of the envelope itself. Following tradition, these effects 
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As an example of such a dispersive medium, let us consider a simple but very representative 
Lorentz oscillator model.'> In dilute atomic or molecular systems (e.g., gases), electrons respond to the 
external electric field especially strongly when its frequency @ is close to certain frequencies @ 
corresponding to the spectrum of quantum interstate transitions of a single atom/molecule. A 
phenomenological description of this behavior may be obtained from a classical model of several 
externally-driven harmonic oscillators, generally with non-zero damping. For a single oscillator, driven 
by the electric field’s force F(t) = gE(t), we can write the 2"* Newton law as 


m(i+26,% + @x)= qE(). (7.30) 


where @ is the own frequency of the oscillator, and op is its damping coefficient. For the electric field 
of a monochromatic wave, E(t) = Re [E,exp{—iat}], we may look for a particular, forced-oscillation 
solution of this equation in a similar form x(t) = Re [x,exp {iar} ].!4 Plugging this solution into Eq. (30), 
we readily find the complex amplitude of these oscillations: 

q E 


= 2 : 7.31 
m (@, — @)—2ia6, 730) 


(a) 


Using this result to calculate the complex amplitude of the dipole moment as py = gx, and then the 
electric polarization P,, = npq of a dilute medium with n independent oscillators for unit volume, for its 
frequency-dependent permittivity (26) we get 


2 


Oe (7.32) 


m (@, —@°)—2i@do 


This result may be readily generalized to the case when the system has several types of 
oscillators with different masses and frequencies: 


E(@) = €) 9 (7.33) 


where /; = n/n is the fraction of oscillators with frequency @, so that the sum of all fj equals 1. Figure 5 
shows a typical behavior of the real and imaginary parts of the complex dielectric constant, described by 
Eq. (33), as functions of frequency. The oscillator resonances’ effect is clearly visible, and dominates 
the media response at w ~ @, especially in the case of low damping, 6; << @. Note that in the low- 
damping limit, the imaginary part of the dielectric constant ¢”, and hence the wave attenuation k”’, are 
negligibly small at all frequencies besides small vicinities of @, where the derivative de’(@)/da is 


are discussed in more detail in the quantum-mechanics part of this series (QM Sec. 2.2), because they are a crucial 
factor of Schrédinger’s wave mechanics. (See also a brief discussion in CM Sec. 6.3.) 

13 This example is focused on the frequency dependence of ¢ rather than yu, because electromagnetic waves 
interact with “usual” media via their electric field much more than via the magnetic field. Indeed, according to Eq. 
(7), the magnetic field of the wave is of the order of E/c, so that the magnetic component of the Lorentz force 
(5.10), acting on a non-relativistic particle, Fi, ~ guB ~ (u/c)gE, is much smaller than that of its electric 
component, F’, = gE, and may be neglected. However, as will be discussed in Sec. 6, forgetting about the possible 
dispersion of 4(@) may result in missing some remarkable opportunities for manipulating the waves. 

14 Tf this point and Eq. (30) are not absolutely clear, please see CM Sec. 5.1 for a more detailed discussion. 
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negative.!> Thus, for a system of weakly-damped oscillators, Eq. (33) may be well approximated by a 
sum of singularities (“poles”): 


2 
f, 
é(@) © & a era es a for 5, <<|@-a,|<<|@, - |. (7.34) 


E(@) 


2 aaa a: 

0 Fig. 7.5. Typical frequency 
dependence of the real and imaginary 
parts of the complex electric 
permittivity, according to the 
generalized Lorentz oscillator model. 


This result is especially important because according to quantum mechanics,!¢ Eq. (34) (with all 
m, equal) is also valid for a set of non-interacting, similar quantum systems (whose dynamics may be 
completely different from that of a harmonic oscillator!), provided that @ are replaced with frequencies 
of possible quantum interstate transitions, and coefficients f; are replaced with the so-called oscillator 
strengths of the transitions — which obey the same sum rule, &; f; = 1. 


At w— 0, the imaginary part of the complex permittivity (33) also vanishes (for any 6), while 
its real part approaches its electrostatic (“dc”) value 


e0)=e,+¢q° >) (7.35) 


de 
ee 
m ,; 
Note that according to Eq. (30), the denominator of the fraction in Eq. (35) is just the effective spring 
constant 1 = mjQ; of the ia oscillator, so that the oscillator masses m; as such are actually (and quite 


naturally) not involved in the static dielectric response. 


In the opposite limit of very high frequencies, @>> @, 6;, the permittivity also becomes real and 
may be represented as 


(7.36) 


!5 In optics, such behavior is called anomalous dispersion. 
16 See, e.g., QM Chapters 5-6. 
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This result is very important because it is also valid at a// frequencies if all @; and 6; vanish, for example 
for gases of free charged particles, in particular for plasmas — ionized atomic gases, provided that the 
ion collision effects are negligible. (This is why the parameter @, defined by Eq. (36) is called the 
plasma frequency.) Typically, the plasma as a whole is neutral, i.e. the density n of positive atomic ions 
is equal to that of the free electrons. Since the ratio n,/m; for electrons is much higher than that for ions, 
the general formula (36) for the plasma frequency is usually well approximated by the following simple 
expression: 
2 
ea. (7.37) 


p 
Eom, 


This expression has a simple physical sense: the effective spring constant Ker = MeO = ne’/& 
describes the Coulomb force that appears when the electron subsystem of the plasma is shifted, as a 
whole, from its positive-ion subsystem, thus violating the electroneutrality.!7 Hence, there is no surprise 
that the function &(@) given by Eq. (36) vanishes at @= @,: at this resonance frequency, the polarization 
electric field E may oscillate, i.e. have a non-zero amplitude E,, = D,/a&@), even in the absence of 
external forces induced by external (stand-alone) charges, i.e. in the absence of the field D these charges 
induce — see Eq. (3.32). 


The behavior of electromagnetic waves in a medium that obeys Eq. (36), is very remarkable. If 
the wave frequency @ is above @p, the dielectric constant «(@) and hence the wave number (28) are 
positive and real, and waves propagate without attenuation, following the dispersion relation, 


(7.38) 


Fig. 7.6. The plasma dispersion law (solid 
line) in comparison with the linear dispersion 
0 1 2 3 in the free space (dashed line). 


k (a, /c) 


At @— @, the wave number & tends to zero. Beyond that point (i.e. at @< @,), we still can use 
Eq. (38), but it is instrumental to rewrite it in the mathematically equivalent form 


'7 Indeed, let us consider such a small shift Ax, perpendicular to the plane surface of a broad, plane slab filled with 
plasma. The uncompensated ion charges, with equal and opposite surface densities o = tenAx, that appear at the 
slab surfaces, create inside it, according to Eq. (2.3), a uniform electric field with EF, = enAx/&. This field exerts 
the force —eE, = —(ne’/&)Ax = —KeAx on each electron, pulling it back to its equilibrium position. 
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k(o) = -(v} -0*)” = 1 Cc 


=, here 6 = ————__.. 7.39 
a OE we 


At @ < @,, the so-defined parameter 6 is real, and Eq. (29) shows that the electromagnetic field 
exponentially decreases with distance: 


Ee Re fella) = exp = =}Re er (7.40) 


Does this mean that the wave is being absorbed in the plasma? Answering this question is a good 
pretext to calculate the time average of the Poynting vector S = ExH of a monochromatic 
electromagnetic wave in an arbitrary dispersive (but still linear and isotropic) medium. First, let us spell 
out the real fields’ time dependences: 


E(t) =RelE,e |= [zc tcc) H(t) =RelH,e 1 |=4 a oH soe. (Al) 
2 2| Z(o) 


Now, a straightforward calculation yields!8§ 


5 = FO) = =| ee |= #Z=Re 1 _ FFI Re| 2 . (7.42) 
4 |Z(@) Z (a) 2 Z(o) eo) (ao) 


Let us apply this important general formula to our simple model of plasma at @ < @. In this 
case, the magnetic permeability equals fio, i.e. u(@) = Lo 1S positive and real, while «&@) is real and 
negative, so that 1/Z(@) = [ao)/ ua)? is purely imaginary, and the average Poynting vector (42) 
vanishes. This means that the energy, on average, does not flow along the z-axis. So, the waves with a< 
@p are not absorbed in plasma. (Indeed, the Lorentz model with 6; = 0 does not describe any energy 
dissipation mechanism.) Instead, as we will see in the next section, the waves are rather reflected from 
the plasma’s boundary, more exactly from its surface layer of a thickness ~6. 


Note also that in the limit @<< @,, Eq. (39) yields 


2 1/2 1/2 
pee ee |) eae (7.43) 
O, ne yne 


But this is just a particular case (for g = e, m = me, and 4 = Lo) of Eq. (6.44) that was derived in Sec. 6.4 
for the depth of the magnetic field’s penetration into a lossless (collision-free) conductor in the 
quasistatic approximation. This fact shows again that, as was already discussed in Sec. 6.7, this 


'8 For an arbitrary plane wave, the total average power flow may be calculated as an integral of Eq. (42) over all 
frequencies. By the way, combining this integral and the Poynting theorem (6.111), is it straightforward to prove 
the following interesting expression for the average electromagnetic energy density of a narrow (Aw << @) wave 
packet propagating in an arbitrary dispersive (but linear and isotropic) medium: 


ee J ee E,E.+ dou) ry 5" ja. 


2, da da 


packet 
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approximation (in which the displacement currents are neglected) gives an adequate description of the 
time-dependent phenomena at w<< @p, 1.e. at 6 << c/@= I/k = N/27.'9 


There are two most important examples of natural plasmas. For the Earth’s ionosphere, i.e. the 
upper part of its atmosphere, which is almost completely ionized by the ultra-violet and X-ray 
components of the Sun’s radiation, the maximum value of n, reached about 300 km over the Earth’s 
surface, is between 10'° and 10'? m® (depending on the time of the day and the Sun’s activity phase), so 
that that the maximum plasma frequency (37) is between 1 and 10 MHz. This is much higher than the 
particles’ typical reciprocal collision time 7’, so that Eq. (38) gives a good description of wave 
dispersion in this plasma. The effect of reflection of electromagnetic waves with w < @, from the 
ionosphere enables the long-range (over-the-globe) radio communications and broadcasting at the so- 
called short waves, with cyclic frequencies of the order of 10 MHz:?° they may propagate in the flat 
channel formed by the Earth’s surface and the ionosphere, being reflected repeatedly by these parallel 
“walls”. Unfortunately, due to the random variations of the Sun’s activity, and hence of @,, this natural 
radio communication channel is not too reliable, and in our age of transworld optical-fiber cables (see 
Sec. 7 below), its practical importance has diminished. 


Another important example of plasmas is free electrons in metals and other conductors. For a 
typical metal, n is of the order of 10°° cm® = 10” m®, so that Eq. (37) yields a ~ 10'° s'. This value of 
@p 1s somewhat higher than the mid-optical frequencies (@ ~ 3x10'° s'), explaining why planar, clean 
metallic surfaces, such as the aluminum and silver films used in mirrors, are so shiny: at these 
frequencies, their complex permittivity «(@) is almost exactly real and negative, leading to light 
reflection, with very little absorption. 


The simple model (36), which neglects electron scattering, becomes inadequate at lower 
frequencies, @t ~ 1. A good phenomenological way of extending the model to the account of scattering 
is to take, in Eq. (33), the lowest frequency @ equal zero (to describe the free electrons), while keeping 
the damping coefficient op of this mode larger than zero, to account for the energy dissipation due to 
their scattering. Then Eq. (33) is reduced to 


ig? 1 oq 1 


=é (@)+i ; 7.44 
~@° —2i06, on (©) 26,mo@ 1-ia/ 26, ina) 


Ee (a) = e ont (a) aD a 


where the response &p(@) at high (in practice, optical) frequencies is still given by Eq. (33), but now 
with 7 > 0. The result (44) allows for a simple interpretation. To show that, let us incorporate into our 
calculations the Ohmic conduction of the medium, generalizing Eq. (4.7) as jg = o(@)E, to account for 
the possible frequency dependence of the Ohmic conductivity. Plugging this relation into the Fourier 
image of the relevant macroscopic Maxwell equation, VxH = jo—i@Do= jo— iad @)Eo, we get 


VxH, =|o()-iwe(o)|E (7.45) 


o° 


!9 One more convenience of the simple model of a collision-free plasma, which has led us to Eq. (36), is that it 
may be readily generalized to the case of an additional strong dc magnetic field By (much higher than that of the 
wave) applied in the direction n of wave propagation. It is straightforward (and hence left for the reader) to show 
that such plasma exhibits the Faraday effect of the polarization plane’s rotation, and hence gives an example of an 
anisotropic media that violates the Lorentz reciprocity relation (6.121). 

20 These frequencies are an order of magnitude lower than those used for TV and FM-radio broadcasting. 
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This relation shows that for a monochromatic wave, the addition of the Ohmic current density j,,to the 
displacement current density is equivalent to the addition of o(@) to —i@e(@), i.e. to the following 
change of the ac electric permittivity:?! 


é(a) > E.¢(®) = Eon (@)+ joo) ; (7.46) 
a) 


Now the comparison of Eqs. (44) and (46) shows that they coincide if we take 
qt 1 1 Generalized 


o(a) =" = o(0) (TAT) Cone 
m, 1l—-iot 1—iotr formula 


where the de conductivity o(0) is described by the Drude formula (4.13), and the phenomenologically 
introduced coefficient dp is associated with 1/27. Eq. (47), which is frequently called the generalized (or 


ac”, or “rf’) Drude formula,?2 gives a very reasonable (semi-quantitative) description of the ac 
conductivity of many metals almost up to optical frequencies. 


Now returning to our discussion of the generalized Lorentz model (33), we see that the 
frequency dependences of the real (¢’) and imaginary (€”) parts of the complex permittivity it yields are 
not quite independent. For example, let us have one more look at the resonance peaks in Fig. 5. Each 
time the real part drops with frequency, de’/da < 0, its imaginary part ¢” has a positive peak. Ralph 
Kronig (in 1926) and Hendrik (“Hans”) Kramers (in 1927) independently showed that this is not an 
occasional coincidence pertinent only to this particular model. Moreover, the full knowledge of the 
function €’(@) enables the calculation of the function €’(@), and vice versa. The mathematical reason 
for this fact is that both these functions are always related to a single real function G(@) — see Eqs. (27). 

To derive these relations, let us consider Eq. (26b) on the complex frequency plane, a> @ = a’ 


5B) 


+10”: 
f(o) =€(@)-& = [e@e®%a0 = [eee a0, (7.48) 


For all stable physical systems, G(@) has to be finite for all important values of the real integration 
variable (9 > 0), and tend to zero at 9 0 and 0 o. (Indeed, according to Eq. (23), a non-zero G(0) 
would mean an instantaneous response of the medium to the external force, while G(oo) # 0 would mean 
that it has an infinitely long memory.) Because of that, and thanks to the factor e” %, the expression 
under the integral in Eq. (48) tends to zero at |@|— © in all upper half-plane (w” = 0). As a result, we 
may claim that the complex function f/(@) given by this relation, is analytical in that half-plane. This fact 
allows us to apply to it the general Cauchy integral formula? 


1 dQ 
f(@)=5— rm (7.49) 


2! Alternatively, according to Eq. (45), it is possible (and in the field of infrared spectroscopy, conventional) to 
attribute the ac response of a medium at al/ frequencies to its effective complex conductivity: oe (@) = o(@) — 
iW @) = -i1 WE 0). 

22 It may be also derived from the Boltzmann kinetic equation in the so-called relaxation-time approximation 
(RTA) — see, e.g., SM Sec. 6.2. 

23 See, e.g., MA Eq. (15.2). 
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where Q = Q’ + iQ” is also a complex variable. Let us take the integration contour C of the form shown 
in Fig. 7, with the radius R of the larger semicircle tending to infinity, and the radius r of the smaller 
semicircle (around the singular point Q = @) tending to zero. Due to the exponential decay of |{Q)| at 
|Q| + o, the contribution to the right-hand side of Eq. (49) from the larger semicircle vanishes,”* while 
the contribution from the small semicircle, where Q= @ + rexp {ig}, with-z< 9 < 0, is 


; 1 dQ @) tirexpfig}d @) { 1 
Him, gf fly A = LO) | eH 2 LO) fap == flo). (7.50) 
TM O_@+rexp{ig) = TO ee rexptig} oe 
Q” 
per Re |Q|=R> 0 
a 
: N 
/ \ 
4 N 
/ \ 
/ \ 
H G oO / Fig. 7.7. Deriving the Kramers- 


0 . Kronig dispersion relations. 
Ke} = a =r>90 2% 


As a result, for our contour C, Eq. (49) yields 


F(0) = lim, 9 == f+] OS +5 f(o)., (7.51) 


where Q = (’ on the real axis (where Q” = 0). Such an integral, excluding a symmetric infinitesimal 
vicinity of a pole singularity, is called the principal value of the (formally, diverging) integral from —oo 
to +00, and is denoted by the letter P before it.2> Using this notation, subtracting f(@)/2 from both parts of 
Eq. (51), and multiplying them by 2, we get 


1 dQ. 
f(@)=—=P | fQ)——. (7.52) 
fl =. Q-a@ 
Now plugging into this complex equality the polarization-related difference f(@) = &@) — & in the form 
[é'(@) — &]| + i[e’(@)], and requiring both real and imaginary components of the two sides of Eq. (52) to 


be equal separately, we get the famous Kramers-Kronig dispersion relations 


tn ee he oe AO ncn. lp tre dQ 
é(Q) = E, a Q)o—, 8 ao Pile (Q)-a,]5—. (7.53) 


We may use the already mentioned fact that ¢’(@) is always an even function, while ¢’(q@) an odd 
function of frequency, to rewrite these relations in the following equivalent form, 


24Strictly speaking, this also requires |{(Q)| to decrease faster than Q”' at the real axis (at Q” = 0), but due to the 
inertia of charged particles, this requirement is fulfilled for all realistic models of dispersion — see, e.g., Eq. (36). 
25 T am typesetting this symbol in a Roman (upright) font, to avoid any possibility of confusion with the medium’s 
polarization. 
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ya QdQ 
é&(@)=€é, +—P | e"(Q) ———— 
(0) = & += J (2)5>— 


22 


” _ 20 i" ’ _ dQ 
é"(@) = ie olor (7.54) 


which is more convenient for most applications, because it involves only physical (positive) frequencies. 


Though the Kramers-Kronig relations are “global” in frequency, in certain cases they allow an 
approximate calculation of dispersion from experimental data for absorption, collected even within a 
limited (“local”) frequency range. Most importantly, if a medium has a sharp absorption peak at some 
frequency @;, we may describe it as 


é"(@)  cO(@— @,) +a more smooth function of a , (7.55) 
and the first of Eqs. (54) immediately gives 


2 QO; : 
é(@) ® €, + oa + another smooth function of a, (7.56) 


T oO, —@ 


thus predicting the anomalous dispersion near such a point. This calculation shows that such behavior 
observed in the Lorentz oscillator model (see Fig. 5) is by no means occasional or model-specific. 


Let me emphasize again that the Kramers-Kronig relations (53)-(54) are much more general than 
the Lorentz model (33), and require only a causal linear relation (21) between the polarization P(t) with 
the electric field E(t’).2° Hence, these relations are also valid for the complex functions relating Fourier 
images of any cause/effect-related pair of variables. In particular, at a measurement of any linear 
response r(t) of any experimental sample to any external field f(t’), whatever the nature of this response 
and physics behind it, we may be confident that there is a causal relationship between the variables r and 
f, so that the corresponding complex function 7(@) = r,/f~ does obey the Kramers-Kronig relations. 
However, it is still important to remember that a linear relationship between the Fourier amplitudes of 
two variables does not necessarily imply a causal relationship between them.?7 


7.3. Reflection 


The most important new effect arising in nonuniform media is wave reflection. Let us start its 
discussion from the simplest case of a plane electromagnetic wave that is normally incident on a sharp 
interface between two uniform, linear, isotropic media. 


Moreover, let us first consider an even simpler sub-case when one of the two media (say, that 
located at z > 0, see Fig. 8) cannot sustain any electric field at all — as implied, in particular, by the 
macroscopic model of a perfect conductor — see Eq. (2.1): 


26 Actually, in mathematics, the relations even somewhat more general than Eqs. (53), valid for an arbitrary 
analytic function of complex argument, are known at least from 1868 — the Sokhotski-Plemelj theorem. 

27 For example, the function @(@) = E,/ P.», in the Lorentz oscillator model, does not obey the Kramers-Kronig 
relations. This is evident not only physically, from the fact that E(‘) is not a causal function of P(f), but even 
mathematically. Indeed, Green’s function describing a causal relationship has to tend to zero at small time delays 
@ =t-—t’, so that its Fourier image has to tend to zero at @ > + «. This is certainly true for the function f(@) 
given by Eq. (32), but not for the reciprocal function g(a) = 1/f(a) « (@ — am) — 2ida, which diverges at large 
frequencies. 
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20 = 9. (7.57) 


This condition is evidently incompatible with the single traveling wave (5). However, this solution may 
be readily generalized using the fact that the dispersion-free 1D wave equation, 


me ae =0, (7.58) 


oz” vy? ot? 
supports waves propagating, with the same speed, in any of two opposite directions. As a result, the 
following linear superposition of two such waves, 


<0 = f(z-vt)— f(-z—-vt), (7.59) 


satisfies both the equation and the boundary condition (57), for an arbitrary function f, The second term 
on the right-hand side of Eq. (59) may be interpreted as a result of total reflection of the incident wave 
(described by its first term) — in this particular case, with the change of the electric field’s sign. This 
means, in particular, that within the macroscopic model, a conductor acts as a perfect mirror. By the 
way, Since the vector n of the reflected wave is opposite to that of the incident one (see the arrows in 
Fig. 8), Eq. (6) shows that the magnetic field of the wave does not change its sign at the reflection: 


E 


<= sft —vt)+ f(-z—-v0)]. (7.60) 


H 


Fig. 7.8. A snapshot of the electric field at the 
reflection of a sinusoidal wave from a perfect 
conductor: a realistic pattern (red lines) and its 
macroscopic, ideal-mirror approximation (blue 
lines). Dashed lines show the snapshots after a 
half-period time delay (@At = 7). 


The blue lines in Fig. 8 show the resulting pattern (59) for the simplest, monochromatic wave: 


oe 
pr i(kz—at) _E ei teat) | (7.61a) 


reflection @ 


Depending on convenience in a particular context, this pattern may be legitimately represented and 
interpreted either as the linear superposition (61a) of two traveling waves or as a single standing wave: 


E 


wr Im(E,e(™ sinkz = 2Re(iE,e sin kz = and fe mia "sin kz, (7.61b) 


in which the electric and magnetic field oscillate with the phase shifts by 7/2 both in time and space: 
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EF ilk EF if-ke EO 
| 9 = Re Ho giliz-at) 5 To gi{-e—at) | — a Rel 2 et | cos kz, (7.62) 
= Z Z Zz 


As aresult of this shift, the time average of the Poynting vector’s magnitude, 
1 57 : 
S(z,t) = EH = = Ree ane sin 2kz , (7.63) 


equals zero, showing that at the total reflection there is no average power flow. (This is natural because 
the perfect mirror can neither transmit the wave nor absorb it.) However, Eq. (63) shows that the 
standing wave features local oscillations of energy, transferring it periodically between the 
concentrations of the electric and magnetic fields, separated by the distance Az = 7/2k = A/4. 


In the case of the sinusoidal waves, the reflection effects may be readily explored even for the 
more general case of dispersive and/or lossy (but still linear) media in which «&(@) and 4(@), and hence 
the wave vector k(@) and the wave impedance Z(@), defined by Eqs. (28), are certain complex functions 
of frequency. The “only” new factor we have to account for is that in this case, the reflection may not be 
total, so that inside the second medium we have to use the traveling-wave solution as well. This factor 
may be taken care of by looking for the solution to our boundary problem in the form 


=Re le re +z pit | (7.64) 


E| <9 = Re le, (_ik-2 Re Jeter | E 


z20 


and hence, according to Eq. (6), 

E ; a a E j _ 
icy = Re “oe _(pit-z —Re itz) me. il gese 2 rei? i | (7.65) 
a Z_() ” Z ,(Q) 


(The indices + and — correspond to the media located at z > 0 and z < 0, respectively.) Please note the 
following important features of Eqs. (64)-(65): 


iH 


(i) They satisfy the Maxwell equations in both media. (Historically, the fact that at z > 0, these 
solutions do not include any components proportional to exp {ik_z}, looked surprising and was called the 
wave extinction paradox.) 


(11) Due to the problem’s linearity, we could (and did :-) take the complex amplitudes of the 
reflected and transmitted wave proportional to that (E,,) of the incident wave, while scaling them with 
dimensionless, generally complex coefficients R and T. As a comparison of Eqs. (64)-(65) with Eqs. 
(61)-(62) shows, the total reflection from an ideal mirror corresponds to R = —1 and T= 0. 


(111) Since in our current problem, the incident wave arrives from one side only (from z = —c), 
there is no need to include a term proportional to exp {—ik.z} into Eqs. (64)-(65) — even though this term 
is also a legitimate solution of our wave equation. However, we would need to add such a term if the 
medium at z > 0 had been nonuniform (e.g., had at least one more interface or any other inhomogeneity), 
because the wave reflected from that additional inhomogeneity would be incident on our interface 
(located at z = 0) from the right. 


(iv) Eqs. (64)-(65) may be used even for the description of the cases when waves cannot 
propagate to z = 0, for example, a conductor or a plasma with @, > @. Indeed, the exponential drop of 
the field amplitude at z > 0 in such cases is automatically described by the imaginary part of the wave 
number k,— see Eq. (29). 
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In order to calculate the coefficients R and 7, we need to use boundary conditions at z = 0. Since 
in our current case of the normal incidence, the reflection does not change the transverse character of the 
partial waves, both vectors E and H remain tangential to the interface plane (in our notation, z = 0). 
Reviewing the arguments that have led us, in statics, to the boundary conditions (3.37) and (5.117) for 
these components, we see that they remain valid for the time-dependent situation as well,2® so that for 
our current case of normal incidence we may write: 


E\0= 20 Hino = Hl (7.66) 

Plugging Eqs. (64)-(65) into these conditions, we readily get two equations for the coefficients R and 7: 
1 1 

1+R=T, 1-—R)=—T. 7.67 

peor (7.67) 


+ 


Solving this simple system of linear equations, we get?? 


(7.68) 


These formulas are very important, and much more general than one might think because they 
are applicable for virtually any 1D waves — electromagnetic or not, provided that the impedance Z is 
defined properly.*° Since in the general case the wave impedances Zs defined by Eq. (28) with the 
corresponding indices, are complex functions of frequency, Eqs. (68) show that R and T may have 
imaginary parts as well. This fact has important consequences at z < 0, where the reflected wave, 
proportional to R, combines (“interferes”) with the incident wave. Indeed, with R = |R |e” (where Q= 
arg R is areal phase shift), the expression in the parentheses in the first of Eqs. (64) becomes 


aR 2 (1 —| R | + | R jer + | R \eiPe 
' ; _ (7.69) 
=(I- |R Je +2|R ei?” sin{k_(z—6_)], where 6_ = i. 
This means that the field may be represented as a sum of a traveling wave and a standing wave, with an 
amplitude proportional to | R |, shifted by the distance 6_ toward the interface, relatively to the ideal- 
mirror pattern (61b) — see Fig. 8. This effect is frequently used for the experimental measurements of an 
unknown impedance Z, of some medium, provided than Z_ is known — most often, it is the free space, 
where Z_ = Zp. For that, a small antenna (the probe), not disturbing the fields’ distribution too much, is 
placed into the wave field, and the amplitude of the ac voltage induced in it by the wave is measured 
with a detector (e.g., a semiconductor diode with a nearly-quadratic /-V curve), as a function of z (Fig. 


28 For example, the first of Eqs. (66) may be obtained by integrating the full (time-dependent) Maxwell equation 
VxE + OB/0ot = 0 over a narrow and long rectangular contour with dimensions / and d (d << /) stretched along the 
interface. At the application of the Stokes theorem to this integral, the first term gives AF J, while the contribution 
of the second term is proportional to the product /d, so that its contribution at d// — 0 is negligible. The proof of 
the second boundary condition is similar — as was already discussed in Sec. 6.2. 

29 Please note that only the media impedances (rather than their wave velocities) are important for reflection in 
this case! Unfortunately, this fact is not clearly emphasized in some textbooks that discuss only the case s4 = Lu, 
when Z = (4/8)'” and v = 1/(se)'” are proportional to each other. 

30 See, e.g., the discussion of elastic waves of mechanical deformations in CM Secs. 6.3, 6.4, 7.7, and 7.8. 
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9). From the results of such a measurement, it is straightforward to find both |R| and o., and hence 
restore the complex R, and then use Eq. (68) to calculate both the modulus and the argument of Z:. 
(Before computers became ubiquitous, a specially lined paper called the Smith chart, had been 
frequently used for performing this recalculation graphically; even nowadays, it is still used for result 
presentation.) 


V x E*(z,t) 


Z 


<> 


Now let us discuss what these results give for waves incident from the free space (Z_(@) = Zo = 
const, k_ = ky = a/c) onto the surfaces of two particular important media. 


| Fig. 7.9. Measurement of the complex 
impedance of a medium (schematically). 


(1) For a collision-free plasma (with negligible magnetization) we may use Eq. (36) with s(@) = 
Lo, to represent the impedance (28) in either of two equivalent forms: 


T/ Ds. 
(ofa) 


The first of these forms is more convenient in the case @ > @, when the wave vector k, and the wave 
impedance Z, of the plasma are real, so that a part of the incident wave does propagate into it. Plugging 
this expression into the latter of Eqs. (68), we see that 7 is real as well: 


(7.70) 


ee ee (7.71) 


1/2 
o+(o? —@) 


Note that according to this formula, and somewhat counter-intuitively, 7 > 1 for any frequency (above 
@p), Inviting the question: how can the transmitted wave be more intensive than the incident one that has 
induced it? For answering this question, we need to compare the powers (rather than the electric field 
amplitudes) of these two waves, i.e. their average Poynting vectors (42): 


EP ge _ Ite [EP 4olo* - 0)" 


Sica _ WA + a7 WZ ‘ 5 \1/2 O° 
0 # 0 oro —@?) 


(7.72) 


The ratio of these two values?! is always below 1 (and tends to zero at @ + @p), so that only a fraction 
of the incident wave power may be transmitted. Hence the result 7 > 1 may be interpreted as follows: an 
interface between two media may be an impedance transformer: it can never transmit more power than 
the incident wave provides, i.e. can only decrease the product S = EH, but since the ratio Z = E/H 
changes at the interface, the amplitude of one of the fields may increase at the transmission. 


Now let us proceed to case @< @ when the waves cannot propagate in the plasma. In this case, 
the second of the expressions (70) is more convenient, because it immediately shows that Z- is purely 


31 This ratio is sometimes also called the “wave transmission coefficient”, but to avoid its confusion with the T 
defined by Eq. (64), it is better to call it the power transmission coefficient. 
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imaginary, while Z. = Zp is purely real. This means that (Z,_Z_) = (Z, + Z.)*, i.e. according to the first of 
Eqs. (68), | R | = 1, so that the reflection is total, i.e. no incident power (on average) is transferred into 
the plasma — as was already discussed in Sec. 2. However, the complex R has a finite argument, 


yp =argR = 2arg(Z, —Z,) =—2 tan | ee a C173) 
(0 -«") 
and hence provides a finite spatial shift (69) of the standing wave toward the plasma surface: 
) —-T Cc =I (a) 
O_= =—tan : 7.74 


On the other hand, we already know from Eq. (40) that the solution at z > 0 is exponential, with 
the decay length 6 described by Eq. (39). Calculating, from the coefficient 7, the exact coefficient 
before this exponent, it is straightforward to verify that the electric and magnetic fields are indeed 
continuous at the interface, completing the pattern shown with red lines in Fig. 8. This wave penetration 
into a fully reflecting material may be experimentally observed, for example, by thinning its sample. 
Even without solving this problem exactly, it is evident that if the sample’s thickness d becomes 
comparable to 6, a part of the exponential “tail” of the field reaches its second interface, and induces a 
propagating wave. This is a classical-electromagnetic analog of the quantum-mechanical tunneling 
through a potential barrier.*2 


Note that at low frequencies, both 6_ and dé tend to the same frequency-independent value, 


2 1/2 1/2 
ae -(! 2a -| me ) 2 230: (7.75) 


2 
fs ne QO, 


which is just the field penetration depth (6.44) calculated for a perfect conductor model (assuming m = 
m, and 44= {W) in the quasistatic limit. This is natural, because the condition @<< @, may be recast as Ao 
= 27c/@>> 271c/ Wp) = 270, i.e. as the quasistatic approximation’s validity condition. 

(11) Now let us consider electromagnetic wave’s reflection from an Ohmic, non-magnetic 
conductor. In the simplest low-frequency limit, when wz is much less than 1, the conductor may be 
described by a frequency-independent conductivity o. 33 According to Eq. (46), in this case we can take 


1/2 
Zi eae (7.76) 
Ey (@) +io/@ 


With this substitution, Eqs. (68) immediately give us all the results of interest. In particular, in the most 
important quasistatic limit when 6, = (2/t0@) << Ay = 21/0, 1.€. 01 O >> & ~ Ep, the conductor’ 


impedance is low: 
o 1/2 2 1/2 PY 
Z,* aa = 3 ea ise: 
io i Ay 


+ 


<< 1: (7.77) 


0 


32 See, e.g., QM Sec. 2.3. 
33In a typical metal, r~ 10°'’s, so that this approximation works well up to @~ 10'° s", i.e. up to the far-infrared 
frequencies. 
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This impedance is complex, and hence some fraction / of the incident wave is absorbed by the 
conductor. The fraction may be found as the ratio of the dissipated power (either calculated, as was done 
above, from Eqs. (68), or just taken from Eq. (6.36), with the magnetic field amplitude | H,| = 2| E|/Zo 
— see Eq. (62)) to the incident wave’s power given by the first of Eqs. (72). The result, 


2 
{= Le Oe (7.78) 


c Ao 
is used for crude estimates of the energy dissipation in metallic-wall waveguides and resonators. It 
shows that to keep the energy losses low, the characteristic size of such systems (which gives a scale of 
the free-space wavelengths 49 at which they are used) should be much larger than 6,. A more detailed 
theory of these structures, and of the energy loss in them, will be discussed later in this chapter. 


7.4. Refraction 


Now let us consider the effects arising at a plane interface between two uniform media when the 
wave’s incidence angle @ (Fig. 10) is arbitrary rather than equal to zero as in our previous analysis, for 
the simplest case of fully transparent media, with real ¢(@) and su(@). (For the sake of notation 
simplicity, in most formulas below, the argument of these functions will be dropped, i.e. just implied.) 


Fig. 7.10. Plane wave’s reflection, transmission, and 
refraction at a plane interface. The plane of the drawing is 
selected to contain all three wave vectors: k,, k., and k’.. 


In contrast with the case of normal incidence, here the wave vectors k_, k’_, and k, of the three 
components (incident, reflected, and transmitted) waves may have different directions. (Such change of 
the transmitted wave’s direction is called refraction.) Hence let us start our analysis by writing a general 
expression for a single plane, monochromatic wave for the case when its wave vector k has all three 
Cartesian components, rather than one. An evident generalization of Eq. (11) for this case is 


f(r,t)= Re} fie (kev + hy + kz) - “ 7 Rel Te cee. I (7.79) 


This expression enables a ready analysis of “kinematic” relations, which are independent of the 
media impedances. Indeed, it is sufficient to notice that to satisfy any linear, homogeneous boundary 
conditions at the interface (z = 0), all partial plane waves must have the same temporal and spatial 
dependence on this plane. Hence if we select the x-z plane so that the vector k_ lies in it, then (k_), = 0, 
and k; and k’_ cannot have any y-component either, i.e. all three wave vectors lie in the same plane — 
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that is selected as the plane of the drawing in Fig. 10. Moreover, because of the same reason, their x- 
components should be equal: 


k_sin0=k_sin@'=k, sinr. (7.80) 


From here we immediately get two well-known laws: of reflection 


(7.82) 


and of refraction:34 


In this form, the laws are valid for plane waves of any nature. In optics, the Snell law (82) is frequently 


represented in the form 
sinr on 


=—, 7.83 
sind n, oe 


where nz is the index of refraction (also called the “refractive index’) of the corresponding medium, 
defined as its wave number normalized to that of the free space (at the particular wave’s frequency): 


(7.84) 


Perhaps the most famous corollary of the Snell law is that if a wave propagates from a medium 
with a higher index of refraction to that with a lower one (i.e. if n. > n+ in Fig. 10), for example from 
water to air, there is always a certain critical value 9, of the incidence angle, 


(7.85) 


at which the refraction angle 7 (see Fig. 10 again) reaches 7/2. At a larger @, 1.e. within the range 0. < 0 
< 7/2, the boundary conditions (80) cannot be satisfied by a refracted wave with a real wave vector, so 
that the wave experiences the so-called total internal reflection. This effect is very important for 
practice because it means that dielectric surfaces may be used as optical mirrors, in particular in optical 
fibers — to be discussed in more detail in Sec. 7 below. This is very fortunate for telecommunication 
technology because light’s reflection from metals is rather imperfect. Indeed, according to Eq. (78), in 
the optical range (Ap ~ 0.5 pm, i.e. @ ~ 10'° s), even the best conductors (with o ~ 6x10* S/m and 
hence the normal skin depth 6, ~ 1.5 nm) provide power loss of at least a few percent at each reflection. 


Note, however, that even within the range @ < 0< 7/2, the field at z > 0 is not identically equal 
to zero: it penetrates into the lower-n media by a distance of the order of Ao, exponentially decaying 
inside it, just as it does at the normal incidence — see Fig. 8. However, at 0# 0 the penetrating field still 
propagates, with the wave number (80), along the interface. Such a field, exponentially dropping in one 
direction but still propagating as a wave in another direction, is commonly called the evanescent wave. 


34 The latter relation is traditionally called the Snell law, after a 17"-century astronomer Willebrord Snellius, but 
it has been traced all the way back to a circa 984 work by Abu Saad al-Ala ibn Sahl. (Claudius Ptolemy who 
performed pioneering experiments on light refraction in the 2"! century AD, was just one step from this result.) 
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One more remark: just as at the normal incidence, the field’s penetration into another medium 
causes a phase shift of the reflected wave — see, e.g., Eq. (69) and its discussion. A new feature of this 
phase shift, arising at 0+ 0, is that it also has a component parallel to the interface — the so-called Goos- 
Hanchen effect. In geometric optics, this effect leads to an image shift (relative to that its position in a 
perfect mirror) with components both normal and parallel to the interface. 


Now let us carry out an analysis of “dynamic” relations that determine amplitudes of the 
refracted and reflected waves. For this, we need to write explicitly the boundary conditions at the 
interface (i.e. the plane z = 0). Since now the electric and/or magnetic fields may have components 
normal to the plane, in addition to the continuity of their tangential components, which were repeatedly 
discussed above, 

E 


XV 


(7.86) 


z=-0 = E 


ioe 


X,y | z=+0? z=0 — Hie. z=+0 ? 


we also need relations for the normal components. As it follows from the homogeneous macroscopic 
Maxwell equations (6.99b), they are also the same as in statics, i.e. D, = const, and B, = const, for our 
coordinate choice (Fig. 10) giving 


e_E. (7.87) 


0 =F, 


z=— z=+0? “_A, z=-0 =, “A, z=+0° 


The expressions of these components via the amplitudes E,, RE,, and TE, of the incident, 
reflected, and transmitted waves depend on the incident wave’s polarization. For example, for a linearly- 
polarized wave with the electric field vector normal to the plane of incidence, i.e. parallel to the 
interface plane, the reflected and refracted waves are similarly polarized — see Fig. 11a. 


(b) 


Fig. 7.11. Reflection and refraction at two different linear polarizations of the incident wave. 


As a result, all E, are equal to zero (so that the first of Eqs. (87) is inconsequential), while the 
tangential components of the electric field are equal to their full amplitudes, just as at the normal 
incidence, so we still can use Eqs. (64) expressing these components via the coefficients R and T. 
However, at 0 # 0 the magnetic fields have not only tangential components 


Hs 


E 4 E + 
_y =Re 7 l- R)cosde oy H,|,0 = Rel FP e il (7.88) 


+ 


but also normal components (see Fig. 1 1a): 
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H; 


E ; Ee E . ae 
9 =Re z (it R)sind e e Pll sed = Re 5 Tsinr e a (7.89) 


a 


Plugging these expressions into the boundary conditions expressed by Eqs. (86) (in this case, for 
the y-components only) and the second of Eqs. (87), we get three equations for two unknown 
coefficients R and T. However, two of these equations duplicate each other because of the Snell law, and 
we get just two independent equations, 


1+R=T, HG pede reser, (7.90) 
Z_ Z 


+ 


which are a very natural generalization of Eqs. (67), with the replacements Z. > Z_cosr, Z, > Z,cos@. 
As aresult, we can immediately use Eq. (68) to write the solution of the system (90):*5 


Z -Z 22 
R= ,cos@—Z_cosr r , cosé 


= ; = ; (7.91a) 
Z,cos0+Z_cosr Z,cos@+Z_cosr 


If we want to express these coefficients via the angle of incidence alone, we should use the Snell 
law (82) to eliminate the angle r, getting frequently quoted bulkier expressions: 


pul cos —Z_[1-(k_/k,)? sin? 6)!” Te 2Z, cosO (7.91b) 
Z, cos0+Z_[1-(k_/k,)? sin? 6] Z, cos0+Z_[1-(k_/k,)° sin? 6]'” 


However, conceptually it is preferable to use the kinematic relation (82) and the dynamic relations (91a) 
separately, because Eq. (91b) obscures the very important physical fact that the ratio of +, i.e. of the 
wave velocities of the two media, is only involved in the Snell law, while Eqs. (91b) explicitly include 
only the wave impedances — just as in the case of normal incidence. 


In the opposite case of the linear polarization of the electric field within the plane of incidence 
(Fig. 11b), it is the magnetic field that does not have a normal component, so it is now the second of 
Eqs. (87) that does not participate in the solution. However, now the electric fields in the two media 
have not only tangential components, 


E,|.o=RelE, (1+ R)cos9e" |, z,|_ j= RelE,Tcosr ec! | (7.92) 
but also normal components (Fig. 1 1b): 
E |, »= £,(-1+ R)sin8, E | w= —E,,f sir. (7.93) 


As aresult, instead of Eqs. (90), the reflection and transmission coefficients are related as 


(1+ R)cos@ =T cosr, ; (1 R)=5 FE: (7.94) 


+ 


Again, the solution of this system may be immediately written using the analogy with Eq. (67): 


35 Note that we may calculate the reflection and transmission coefficients R’ and T’ for the wave traveling in the 
opposite direction just by making the following parameter swaps: Z. <> Z. and 0 < +, and that the resulting 
coefficients satisfy the following Stokes relations: R’=—R, and R? + TT’ = 1, for any Zz. 
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Z =z 2Z 
pu 21008" ee”) Te , cosé (7.95a) 
Z, cosr+Z_cos@ Z,cosr+Z_cos@ 
or, alternatively, using the Snell law, in a more bulky form: 
oe Alle 
Pee Al (k_/k,)° sin a] Li cs Te 22, = (7.95b) 
Z,{1-(k./k,)° sin? 6" +Z_ cos Z,{1-(k./k,) sin? 6]'” +Z_cos0 


For the particular case uw = uo = so, when Z,/Z. = (e/e)'? = klk. = n/n. (which is 
approximately correct for traditional optical media), Eqs. (91b) and (95b) are called the Fresnel 
formulas.36 Most textbooks are quick to point out that there is a major difference between them: while 
for the electric field polarization within the plane of incidence (Fig. 11b), the reflected wave’s amplitude 
(proportional to the coefficient R) turns to zero%’ at a special value of @ (called the Brewster angle):*8 


6, =tan +, (7.96) 
nN 


while there is no such angle in the opposite case (shown in Fig. 11a). However, note that this statement, 
as well as Eq. (96), is true only for the case 44. = wz. In the general case of different ¢ and yw, Eqs. (91) 
and (95) show that the reflected wave vanishes at 9= @3 with 


- -eu {(u,/u_), forE Ln, (Fig.11a), 
— X 


(7.97) 


é,u,-e mu. |(-e,/e_), forH Ln, (Fig.11b). 


Note the natural ¢ <> yw symmetry of these relations, resulting from the E <> H symmetry for 
these two polarization cases (Fig. 11). These formulas also show that for any set of parameters of the 
two media (with é, /.> 0), tan” p is positive (and hence a real Brewster angle @g exists) only for one of 
these two polarizations. In particular, if the interface is due to the change of yw alone (i.e. if €, = &), the 
first of Eqs. (97) is reduced to the simple form (96) again, while for the polarization shown in Fig. 11b, 
there is no Brewster angle, i.e. the reflected wave has a non-zero amplitude for any @. 


Such an account of both media parameters, ¢ and yw, on an equal footing is necessary to describe 
several interesting effects. The first of them is the so-called negative refraction.*? As was shown in Sec. 


36 Named after Augustin-Jean Fresnel (1788-1827), one of the wave optics pioneers, who is credited, among 
many other contributions (see, in particular, discussions in Ch. 8), for the concept of light as a purely transverse 
wave. 

37 This effect is used in practice to obtain linearly polarized light, with the electric field vector perpendicular to 
the plane of incidence, from the natural light with its random polarization. An even more widespread application 
of the effect is a partial reduction of undesirable glare from wet pavement (for the water/air interface, n./n_ ~ 1.33, 
giving 03 = 50°) by covering glasses and car headlights with thin vertically-polarizing layers. 

38 A very simple interpretation of Eq. (96) is based on the fact that, together with the Snell law (82), it gives r+ 0 
= 7/2. As a result, the vector E. is parallel to the vector k’_, and hence the oscillating electric dipoles of the 
medium do not have the Cartesian component that could induce the transverse electric field E’_ of the potential 
reflected wave. 

39 Despite some important background theoretical work by A. Schuster (1904), L. Mandelstam (1945), D. 
Sivikhin (1957), and especially V. Veselago (1966-67), the negative refractivity effects became a subject of 
intensive scientific research and engineering development only in the 2000s. 
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2, in a medium with electric-field-driven resonances, the function «&(@) may be almost real and negative, 
at least within limited frequency intervals — see, in particular, Eq. (34) and Fig. 5. As has already been 
discussed, if, at these frequencies, the function 4(@) is real and positive, then k-(o) = @ 0) ) < 0, 
and k may be represented as i/6 with a real 6, meaning the exponential field decay into the medium. 
However, let us consider the case when both «&@) < 0 and 4(@) < 0 at a certain frequency. (This is 
possible in a medium with both E-driven and H-driven resonances, at a proper choice of their resonance 
frequencies.) Since in this case k*(@) = @ 0) 0) > 0, the wave vector is real, so that Eq. (79) 
describes a traveling wave, and one could think that there is nothing new in this case. Not so! 


First of all, for a sinusoidal plane wave (79), the operator V is equivalent to the multiplication by 
ik. As the Maxwell equations (2a) show, this means that at a fixed direction of vectors E and k, the 
simultaneous reversal of signs of ¢ and 4 means the reversal of the direction of the vector H. Namely, if 
both ¢ and yw are positive, these equations are satisfied with mutually orthogonal vectors {E, H, k} 
forming the usual, right-hand system (see Fig. 1 and Fig. 12a), the name stemming from the popular 
“right-hand rule” used to determine the vector product’s direction. However, if both ¢ and yw are 
negative, the vectors form a /eft-hand system — see Fig. 12b. (Due to this fact, the media with ¢ <0 and 
<0 are frequently called the left-handed materials, LHM for short.) According to the basic relation 
(6.114), which does not involve media parameters, this means that for a plane wave in a left-hand 
material, the Poynting vector S = ExH, i.e. the energy flow, is directed opposite to the wave vector k. 


a b 
S (a) P (b) 
k : — ; 
H H Fig. 7.12. Directions of the main 
vectors of a plane wave inside a 
medium with (a) positive and (b) 
E E k negative values of ¢ and yu. 


This fact may look strange but is in no contradiction with any fundamental principle. Let me 
remind you that, according to the definition of the vector k, its direction shows the direction of the phase 
velocity Vvph = @/k of a sinusoidal (and hence infinitely long) wave, which cannot be used, for example, 
for signaling. Such signaling (by sending wave packets — see Fig. 13) is possible only with the group 
velocity Ver = daldk. This velocity in left-hand materials is always directed (as in the right-hand 
materials) along the vector S, i.e. along the wave’s energy flow. 


Fig. 7.13. An example of a wave 
packet moving along axis z with a 
negative phase velocity, but positive 
group velocity. Blue lines show a 
packet’s snapshot a short time interval 
after the first snapshot (red lines). 
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Perhaps the most fascinating effect possible with left-hand materials is the wave refraction at 
their interfaces with the usual, right-handed materials — first predicted by V. Veselago in 1960. Consider 
the example shown in Fig. 14a. In the incident wave, arriving from a usual material, the directions of 
the vectors k_ and S_ coincide, and so they are in the reflected wave with vectors k’_ and S’_. This 
means that the electric and magnetic fields in the interface plane (z = 0) are, at our choice of the 
coordinate axes, proportional to exp {ik,x}, with a positive component 4, = k_cos @ To satisfy any linear 
boundary conditions, the refracted wave, propagating into the left-handed material, has to match that 
dependence, i.e. have a positive x-component of its wave vector k:. But in this medium, this vector has 
to be antiparallel to the vector S, which in turn should be directed out of the interface, because it 
represents the power flow from the interface into the material’s bulk. These conditions cannot be 
reconciled by the refracted wave propagating along the usual Snell-law direction (shown with the 
dashed line in Fig. 13a), but are all satisfied at refraction in the direction given by Snell’s angle with the 
opposite sign. (Hence the term “negative refraction’’).*° 


Fig. 7.14. Negative refraction: (a) waves at the interface between media with positive and negative values 
of eu, and (b) the hypothetical perfect lens: a parallel plate made of a material with ¢=—é and y= —su. 


In order to understand how unusual the results of the negative refraction may be, let us consider 
a parallel slab of thickness d, made of a hypothetical left-handed material with exactly selected values ¢ 
=—g&, and 14 = —Lu (see Fig. 14b). For such a material, placed in free space, the refraction angle r = —8, 
so that the rays from a point source, located in free space at a distance a < d from the slab’s surface, 
propagate as shown on that panel, i.e. all meet again at the distance a beyond the surface, and then 
continue to propagate to the second surface of the slab. Repeating our discussion for this surface, we see 
that a point’s image is also formed beyond the slab, at distance 2a + 2b = 2a + 2(d — a) = 2d from the 
object. 


Superficially, this system looks like a usual lens, but the well-known lens formula, which relates 
a and b with the focal length f, is not satisfied. (In particular, a parallel beam is not focused into a point 
at any finite distance.) As an additional difference from the usual lens, the system shown in Fig. 14b 
does not reflect any part of the incident light. Indeed, it is straightforward to check that for all the above 
formulas for R and T to be valid, the sign of the wave impedance Z in left-handed materials has to be 
kept positive. Thus, for our particular choice of parameters (¢ = —&, 44 = —10), Eqs. (91a) and (95a) are 


40 In some publications inspired by this fact, the left-hand materials are prescribed a negative index of refraction 
n. However, this prescription should be treated with care. For example, it complies with the first form of Eq. (84), 
but not its second form, and the sign of n, in contrast to that of the wave vector k, is a matter of convention. 
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valid with Z; = Z.= Zp) and cos r= cos @ = 1, giving R = 0 for any linear polarization, and hence for any 
other wave polarization — circular, elliptic, natural, etc. 


The perfect lens suggestion has triggered a wave of efforts to implement left-hand materials 
experimentally. (Attempts to find such materials in nature have failed so far.) Most progress in this 
direction has been achieved using the so-called metamaterials, which are essentially quasi-periodic 
arrays of specially designed electromagnetic resonators, ideally with high density n >> A>. For example, 
Fig. 15 shows the metamaterial that was used for the first demonstration of negative refractivity in the 
microwave region — for ~10-GHz waves.*! It combines straight strips of a metallic film, working as 
lumped resonators with a large electric dipole moment (and hence strongly coupled to the wave’s 
electric field E), and several almost-closed film loops (so-called split rings), working as lumped 
resonators with large magnetic dipole moments, strongly coupled to the field H. The negative 
refractivity is achieved by designing the resonance frequencies close to each other. More recently, 
metamaterials with negative refractivity were demonstrated in the optical range as well,*? although to 
the best of my knowledge, their relatively large absorption still prevents practical applications. 


Fig. 7.15. An artificial left-hand material 
providing negative refraction at microwave 
frequencies ~10 GHz. The original by Jeffrey 
D. Wilson (in the public domain) is available 
at https://en.wikipedia.org/wiki/Metamaterial. 


This progress has stimulated the development of other potential uses of metamaterials (not 
necessarily the left-handed ones), in particular, designs of nonuniform systems with handcrafted 
distributions «(r, @) and sr, @) that may provide electromagnetic wave propagation along the desired 
paths, e.g., around a certain region of space, making it virtually invisible for an external observer — so 
far, within very limited frequency ranges. 


As was mentioned in Sec. 5.5, another way to reach negative values of s(@) is to place a 
ferromagnetic material into such an external de magnetic field that the frequency @, of the ferromagnetic 
resonance is somewhat lower than o. If thin layers of such a material (e.g., nickel) are interleaved with 
layers of a non-magnetic good conductor (such as copper), the average value of 4(@) of the resulting 
metamaterial may be positive but substantially below 4. According to Eq. (6.33), the skin-depth 6, of 
such a material may be larger than that of the good conductor alone, enforcing a more uniform 
distribution of the ac current flowing along the layers, and hence making the energy losses lower than in 
the good conductor alone. This effect may be useful, in particular, for electronic circuit interconnects.*4 


41 R. Shelby et al., Science 292, 77 (2001); J. Wilson and Z. Schwartz, Appl. Phys. Lett. 86, 021113 (2005). 
42 See, e.g., J. Valentine et al., Nature 455, 376 (2008). 

43 For a review of such “invisibility cloaks”, see, e.g., B. Wood, Comptes Rendus Physique 10, 379 (2009). 
44 See, for example, N. Sato et al., J. Appl. Phys. 111, 07A501 (2012), and references therein. 
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7.5. Transmission lines: TEM waves 


So far, we have analyzed plane electromagnetic waves, implying that their cross-section is 
infinite — evidently, an unrealistic assumption. The cross-section may be limited, still sustaining wave 
propagation, using wave transmission lines:*> long, uniform structures made of either good conductors 
or dielectrics. Let us first discuss the first option, using the following simplifying assumptions: 


(i) the structure is a cylinder (not necessarily with a round cross-section, see Fig. 16) filled with a 
usual (right-handed), uniform dielectric material with negligible energy losses (¢” = uw” = 0), and 


(ii) the wave attenuation due to the skin effect is also negligibly low. (As Eq. (78) indicates, for 
that the characteristic size a of the line’s cross-section has to be much larger than the skin-depth 6, of its 
wall material. The energy dissipation effects will be analyzed in Sec. 9 below.) 


Fig. 7.16. Electric field’s 
decomposition in a transmission 
line (in particular, a waveguide). 


With such exclusion of energy losses, we may look for a particular solution of the macroscopic 
Maxwell equations in the form of a monochromatic wave traveling along the line: 


E(,)=Re[E,(x,ye | Hee) = Re H, (x, pel’ | (7.98) 


with real k,, where the z-axis is directed along the transmission line — see Fig. 16. Note that this form 
allows a substantial coordinate dependence of the electric and magnetic field within the plane [x, y] of 
the transmission line’s cross-section, as well as nonvanishing longitudinal components EF, and/or H, of 
the fields, so that the solution (98) is substantially more general than the plane waves discussed above. 
We will see in a minute that as a result, the parameter k, may be very much different from its plane-wave 
value (13), k= ae)”, in the same material, at the same frequency. 


In order to describe these effects quantitatively, let us decompose the complex amplitudes of the 
wave’s fields into their longitudinal and transverse components (Fig. 16):4° 


E,=En.+E, H,=H.n.+H,. (7.99) 


45 Another popular term is the waveguide, but it is typically reserved for the transmission lines with singly- 
connected cross-sections, to be analyzed in the next section. The first structure for guiding waves was proposed 
by J. J. Thomson in 1893, and experimentally tested by O. Lodge in 1894. 

46 For the notation simplicity, I am dropping index @ in the complex amplitudes of the field components, and also 
have dropped the argument @ in k, and Z, even though these parameters may depend on the wave’s frequency 
rather substantially — see below. 
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Plugging Eqs. (98)-(99) into the source-free Maxwell equations (2), and requiring the longitudinal and 
transverse components to be balanced separately, we get 
ik.n, xE,-iouwH,=-V,x(E.n.),  ik.n,xH, +i@eE, =-V, x(H.n.), 
V,xE, =iouHn,, V,xH, =—ieoEn,, (7.100) 
V,-E, =-ik,E,, V,:H, =-ik,H,,. 


where V; is the 2D del operator acting in the transverse plane [x, y] only, i.e. the usual V, but with 0/dz 
= 0. The system (100) looks even bulkier than the original equations (2), but it is much simpler for 
analysis. Indeed, by eliminating the transverse components from these equations (or, even simpler, just 
by plugging Eq. (99) into Eqs. (3) and keeping only their z-components), we get a pair of self-consistent 
equations for the longitudinal components of the fields, 47 


2D Helmholtz 
equations for 


E,and H, (7.101) 


Wave vector 
component 
balance 


(7.102) 


After the distributions E(x,y) and H(x,y) have been found from these equations, they provide right-hand 
sides for the rather simple, closed system of equations (100) for the transverse components of field 
vectors. Moreover, as we will see below, each of the following three types of solutions: 


(1) with E, = 0 and H, = 0 (called the transverse electromagnetic, or TEM waves), 
(11) with E, = 0, but H,# 0 (called either the TE waves or, more frequently, H-modes), and 
(11) with E, 40, but H.= 0 (the so-called TM waves or E-modes), 


has its own dispersion law and hence its own wave propagation velocity; as a result, these modes (i.e. 
the field distribution patterns) may be considered separately. 


In the balance of this section, we will focus on the simplest, TEM waves (1), with no longitudinal 
components of either field. For them, the top two equations of the system (100) immediately give Eqs. 
(6) and (13), and &, = k. In plain English, this means that E = E,; and H = H, are proportional to each 
other and are mutually perpendicular (just as in the plane wave) at each point of the cross-section, and 
that the TEM wave’s impedance Z = E/H and dispersion law a@(k), and hence the propagation speed, are 
the same as in a plane wave in the same material. In particular, if ¢ and uw are frequency-independent 
within a certain frequency range, the dispersion law within this range is linear, @ = ki(eqi)”, and the 
wave’s speed does not depend on its frequency. For practical applications to telecommunications, this is 
a very important advantage of the TEM waves over their TM and TE counterparts — to be discussed in 
the next sections. 


Unfortunately for practice, such waves cannot propagate in every transmission line. To show 
this, let us have a look at the two last lines of Eqs. (100). For the TEM waves (E, = 0, H, = 0, k, =k), 
they are reduced to merely 


47 The wave equation represented in the form (101), even with the 3D Laplace operator, is called the Helmholtz 
equation, named after Hermann von Helmholtz (1821-1894) — the mentor of H. Hertz and M. Planck, among 
many others. 
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V,xE, =0, V,xH, =9, 


(7.103) 
vV,-E, =0, V,-H, =0. 


Within the coarse-grain description of the conducting walls of the line (i. e., neglecting not only the 
screening depth but also the skin depth in comparison with the cross-section dimensions), we have to 
require that inside them, E = H = 0. Close to a wall but outside it, the normal component £,, of the 
electric field may be different from zero, because surface charges may sustain its jump — see Sec. 2.1, in 
particular Eq. (2.3). Similarly, the tangential component H, of the magnetic field may have a finite jump 
at the surface due to skin currents — see Sec. 6.3, in particular Eq. (6.38). However, the tangential 
component of the electric field and the normal component of the magnetic field cannot experience such 
jumps, and to have them equal to zero inside the walls they have to equal zero just outside the walls as 
well: 

E, =0, H,=0. (7.104) 


But the left columns of Eqs. (103)-(104) coincide with the formulation of the 2D boundary 
problem of electrostatics for the electric field induced by electric charges of the conducting walls, with 
the only difference that in our current case the value of ¢ actually means «(@). Similarly, the right 
columns of those relations coincide with the formulation of the 2D boundary problem of magnetostatics 
for the magnetic field induced by currents in the walls, with 4— (@), with the only difference is that in 
our current coarse-grain approximation, the magnetic fields cannot penetrate into the conductors. 


Now we immediately see that in waveguides with a singly-connected wall, for example, a hollow 
conducting tube (see, e.g., Fig. 16), the TEM waves are impossible, because there is no way to create a 
non-zero electrostatic field inside a conductor with such cross-section. However, such fields (and hence 
the TEM waves) are possible in structures with cross-sections consisting of two or more disconnected 
(galvanically-insulated) parts — see, e.g., Fig. 17. 


Fig. 7.17. An example of the cross- 
section of a transmission line that may 
support the TEM wave propagation. 


In order to derive “global” relations for such a transmission line, let us consider the contour C 
drawn very close to the surface of one of its conductors — see, e.g., the red dashed line in Fig. 17. We 
can consider it, on one hand, as the cross-section of a cylindrically-shaped Gaussian volume of a certain 
elementary length dz << 4 =27/k. Using the generalized Gauss law (3.34), we get 


f(E,),dr=—*, (7.105) 


where A,, (not to be confused with the wavelength 1!) is the complex amplitude of the linear density of 
the electric charge of the conductor. On the other hand, the same contour C may be used in the 
generalized Ampére law (5.116) to write 
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f(H,),.dr =[,, (7.106) 


where /,, is the total current flowing along the conductor (or rather its complex amplitude). But, as was 
mentioned above, in the TEM wave the ratio E,/H, of the field components participating in these two 
integrals is constant and equal to Z = (we), so that Eqs. (105)-(106) give the following simple relation 
between the “global” variables of the conductor: 
pee Ne Oy (7.107) 
aC a 
This important relation may be also obtained in a different way; let me describe it as well, 
because (as we will see below) it has an independent heuristic value. Let us consider a small segment dz 
<< A = 2a/k of the line’s conductor, and apply the electric charge conservation law (4.1) to the instant 
values of the linear charge density and current. The cancellation of dz in both parts yields 


AM(z,t) _ al(z,t) 
Ot Oz 


(7.108) 


If we accept the sinusoidal waveform, exp {i(Az — af)}, for both these variables, we immediately recover 
Eq. (107) for their complex amplitudes, showing that this relation expresses just the charge continuity 
law. 


The global equation (108) may be made more specific in the case when the frequency 
dependence of ¢ and wis negligible, and the transmission line consists of just two isolated conductors — 
see, e.g., Fig. 17. In this case, to have the wave localized in the space near the two conductors, we need 
a sufficiently fast decrease of its electric field at large distances. For that, their linear charge densities for 
each value of z should be equal and opposite, and we can simply relate them to the potential difference V 
between the conductors: 

A(z,t) _ 


jor (7.109) 


where Co is the mutual capacitance of the conductors (per unit length) — which was repeatedly discussed 
in Chapter 2. Then Eq. (108) takes the following form: 


OV(z,t) __—_— al (z, t) 
Ot éz 
Next, let us consider the contour shown with the red dashed line in Fig. 18 (which shows a 


different cross-section of the transmission line — by a plane containing the wave propagation axis z), and 
apply to it the Faraday induction law (6.3). 


Cy (7.110) 


I(z,t) 1H & 
2 


| 7 av 
V(z,t)| + dD =L,I(z,t)dz Wren Ne 
: i Oz Fig. 7.18. Electric current, magnetic flux, and 


voltage in a two-conductor transmission line. 
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Since, in the coarse-grain approximation, the electric field inside the conductors (in Fig. 18, on 
the horizontal segments of the contour) vanishes, the total e.m.f. equals the difference of the voltages V 
at the ends of the segment dz, while the only sources of the magnetic flux through the area limited by the 
contour are the (equal and opposite) currents +/ in the conductors, we can use Eq. (5.70) to express the 
flux. As a result, canceling dz in both parts of the equation, we get 


L Ol(z,t) OV (z,t) 
° at oz 


where Lo is the mutual inductance of the conductors per unit length. The only difference between this Lo 
and the dc mutual inductances discussed in Chapter 5 is that at the high frequencies we are analyzing 
now, Lo should be calculated neglecting the magnetic field penetration into the conductors. (In the dc 
case, we had the same situation for superconductor electrodes within their coarse-grain, ideal-diamagnet 
description.) 


(7.111) 


The system of Eqs. (110) and (111) is frequently called the telegrapher’s equations. Combined, 
they give for any “global” variable f(either V, or J, or A) the usual 1D wave equation, 


a? a? 
= iC. — =0, (7.112) 


which describes dispersion-free TEM wave’s propagation. Again, this equation is only valid within the 
frequency range where the frequency dependence of both ¢ and yw is negligible. If this is not so, the 
global approach may still be used for sinusoidal waves f= Re[f,exp {i(kz — at)}]. Repeating the above 
arguments, instead of Eqs. (110)-(111) we get a more general system of two algebraic equations 


aC V,, =k, OL, 1, =KkV,, (FAMSS 
in which Lo x yz and Cp « € may now depend on frequency. These equations are consistent only if 


(7.114) 


Besides the fact we have already known (that the TEM wave’s speed is the same as that of the plane 
wave), Eq. (114) gives us the result that I confess was not emphasized enough in Chapter 5: the product 
LoCo does not depend on the shape or size of line’s cross-section, provided that the magnetic field’s 
penetration into the conductors is negligible). Hence, if we have calculated the mutual capacitance Co of 
a system of two cylindrical conductors, the result immediately gives us their mutual inductance: Lo = 
éu/Co. This relationship stems from the fact that both the electric and magnetic fields may be expressed 
via the solution of the same 2D Laplace equation for the system’s cross-section. 


With Eq. (114) satisfied, any of Eqs. (113) gives the same result for the following ratio: 


(7.115) 


which is called the transmission line’s impedance. This parameter has the same dimensionality (in SI 
units — ohms, denoted Q) as the wave impedance (7), 
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1/2 
Z272 -(4) (7.116) 


but these parameters should not be confused, because Zw depends on the cross-section’s geometry, while 
Z does not. In particular, Zy is the only important parameter of a transmission line for its matching with 
a lumped load circuit (Fig. 19) in the important case when both the cable cross-section’s size and the 
load’s linear dimensions are much smaller than the wavelength.*8 


Z,(@) | Vn Fig. 7.19. Passive, lumped 
termination of a TEM 


———— ee transmission line. 


Indeed, in this case, we may consider the load in the quasistatic limit and write 
V(Z)) =Z,(@)1 (Zo) » (7.117) 


where Z;(@) is the (generally complex) impedance of the load. Taking V(z,t) and J(z,f) in the form 
similar to Eqs. (61) and (62), and writing the two Kirchhoff’s circuit laws for the point z = Zo, we get for 
the reflection coefficient a result similar to Eq. (68): 
R= Z(@)~Zy (7.118) 
Z(@)+Zy 


This formula shows that for the perfect matching (i.e. the total wave absorption in the load), the load’s 
impedance Z;(@) should be real and equal to Zjy — but not necessarily to Z. 


As an example, let us consider one of the simplest (and most practically important) transmission 
lines: the coaxial cable (Fig. 20).*9 


Fig. 7. 20. The cross-section of a coaxial cable 
with (possibly, dispersive) dielectric filling. 


For this geometry, we already know the expressions for both Co and Lo,*° though they have to be 
modified for the account of arbitrary dielectric and magnetic constants, and the magnetic field’s non- 
penetration into the conductors. As a result of this (elementary) modification, we get the formulas, 


48 The ability of TEM lines to have such a small cross-section is another important practical advantage. 
49 It was invented by the same O. Heaviside in 1880. 
50 See, respectively, Eqs. (2.49) and (5.79). 
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zal L, =-n(b/a), (7.119) 


0 'in(b/a)’ Qn 


illustrating that the universal relationship (114) is indeed valid. For the cable’s impedance (115), Eqs. 
(119) yield a geometry-dependent value 


1/2 
Zi -(4) me) 7 OID) 2G (7.120) 
é 20 20 


For the standard TV antenna cables (such as RG-6/U, with b/a ~ 3, &/& ~ 2.2), Zw = 75 Q, while for 
most computer component connections, coaxial cables with Zw = 50 Q (such as RG-58/U) are 
prescribed by electronic engineering standards. Such cables are broadly used for the transmission of 
electromagnetic waves with frequencies up to | GHz over distances of a few km, and up to ~20 GHz on 
the tabletop scale (a few meters), limited by wave attenuation — see Sec. 9 below. 


Moreover, the following two facts enable a wide application, in electrical engineering and 
physical experiment, of coaxial-cable-like systems. First, as Eq. (5.78) shows, in a cable with a << b, 
most energy of the wave is localized near the internal conductor. Second, the theory to be discussed in 
the next section shows that excitation of other (H- and E-) waves in the cable is impossible until the 
wavelength 2 becomes smaller than ~z(a + b). As a result, the TEM mode propagation in a cable with a 
<< b < A/mis not much affected even if the internal conductor is not straight, but bent — for example, 
into a helix — see, e.g., Fig. 21. 


1 2 3 4 567 8 


Fig. 7.21. A typical traveling-wave tube: 
(1) electron gun, (2) ac input, (3) beam- 
focusing magnets, (4) wave attenuator, 
(5) helix coil, (6) ac output, (7) vacuum 
tube, (8) electron collector. Adapted from 
https://en. wikipedia.org/wiki/Traveling- 
wave tube under the Creative Commons 
BY-SA 3.0 license. 


In such a system, called the traveling-wave tube (TWT), a quasi-TEM wave propagates with 
velocity v = c along the helix’s length, so that the velocity’s component along the cable’s axis may be 
made close to the velocity u << c of the electron beam moving ballistically along the tube’s axis, 
enabling their effective interaction, and as a result, a length-accumulating amplification of the wave.°! 


Another important example of a TEM transmission line is a set of two parallel wires. In the form 
of twisted pairs,>? they allow communications, in particular long-range telephone and DSL Internet 


5! Despite the current prevalence of semiconductor devices in electronics, TWTs are still used in satellite 
TV and radio systems, because they may work at very high microwave power -— e.g., up to 200W at 20 
GHz and pulsed 50W at 200 GHz. Very unfortunately, in this course, I will not have time/space to discuss 
even the (rather elegant) basic theory of such devices. The reader interested in this field may be referred, for 
example, to the detailed monograph by J. Whitaker, Power Vacuum Tubes Handbook, 3" ed., CRC Press, 2017. 
52 Such twisting, around the line’s direction axis, reduces the crosstalk between adjacent lines, and the parasitic 
radiation at their bends. 
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connections, at frequencies up to a few hundred kHz, as well as relatively short, multi-line Ethernet and 
TV cables at frequencies up to ~ 1 GHz, limited mostly by the mutual interference (“crosstalk”) between 
the individual lines of the same cable, and the unintentional radiation of the wave into the environment. 


7.6. Waveguides: H and E' waves 


Let us now return to Eqs. (100) and explore the H- and E-waves — with, respectively, either H, or 
E, different from zero. At the first sight, they may seem more complex. However, Eqs. (101), which 
determine the distribution of these longitudinal components over the cross-section, are just the 2D 
Helmholtz equations for scalar functions. For simple cross-section geometries, they may be readily 
solved using the methods discussed for the Laplace equation in Chapter 2, in particular the variable 
separation. After the solution of such an equation has been found, the transverse components of the 
fields may be calculated by differentiation, using the simple formulas, 


i 1 
E, =Frlk.VE, —kZ(n, xV_,H.)| H, mre | 


.v.H. +n, “VE (7.121) 


t 


which follow from the first line of Eqs. (100).53 


In comparison with the boundary problems of electro- and magnetostatics, the only conceptually 
new feature of Eqs. (101) is that they form the so-called eigenproblems, with typically many solutions 
(eigenfunctions), each describing a specific wave mode, and corresponding to a specific eigenvalue of 
the parameter k, The good news here is that these values of k, are determined by this 2D boundary 
problem and hence do not depend on k,. As a result, the dispersion law a@(k-,) of any mode, which 
follows from the last form of Eq. (102), 


Universal 
dispersion 
relation 


(7.122) 


is functionally similar for all modes. It is also similar to that of plane waves in plasma (see Eq. (38), Fig. 
6, and their discussion in Sec. 2), with the only differences that the speed in light c is generally replaced 
with v = eu)” — the speed of the plane or TEM waves in the medium filling the waveguide, and that 
@p is replaced with the so-called cutoff frequency 


@, =vk,, (7.123) 


specific for each mode. (As Eq. (101) implies, and as we will see from several examples below, k; has 
the order of 1/a, where a is the characteristic dimension of the waveguide’s cross-section, so that the 
critical value of the free-space wavelength 4 = 2c/q@ is of the order of a.) Below the cutoff frequency of 
each particular mode, such wave cannot propagate in the waveguide.** As a result, the modes with the 


53 For the derivation of Eqs. (121), one of these two linear equations should be first vector-multiplied by n,. Note 
also that this approach could not be used to analyze the TEM waves, because for them ‘4, = 0, E, = 0, H, = 0, and 
Egs. (121) yield uncertainty. 

54 An interesting twist in the ideas of electromagnetic metamaterials (mentioned in Sec. 5 above) is the so-called 
&near-zero materials, designed to have the effective product sz much lower than 4 within certain frequency 
ranges. Since at these frequencies the speed v (4) becomes much lower than c, the cutoff frequency (123) virtually 


Chapter 7 Page 37 of 68 


Essential Graduate Physics EM: Classical Electrodynamics 


lowest values of @ present special practical interest, because the choice of the signal frequency @ 
between the two lowest values of the cutoff frequency (123) guarantees that the waves propagate in the 
form of only one mode, with the lowest k, Such a choice enables engineers to simplify the excitation of 
the desired mode by wave generators and to avoid the unintentional transfer of electromagnetic wave 
energy to undesirable modes by (virtually unavoidable) small inhomogeneities of the system. 


The boundary conditions for the Helmholtz equations (101) depend on the propagating wave 
type. For the E-modes, with H, = 0 but E, # 0, the condition E,= 0 immediately gives 


Es 


= 9, (7.124) 


where C is the inner contour limiting the conducting wall’s cross-section. For the H-modes, with E, = 0 
but H, # 0, the boundary condition is slightly less obvious and may be obtained using, for example, the 
second equation of the system (100), vector-multiplied by n,. Indeed, for the component normal to the 
conductor surface, the result of such multiplication is 

ik.(H,), Bea XE, ).= uae (7.125) 

Z On 

But the first term on the left-hand side of this relation must be zero on the wall surface, because of the 
second of Eqs. (104), while according to the first of Eqs. (104), the vector E, in the second term cannot 
have a component tangential to the wall. As a result, the vector product in that term cannot have a 
normal component, so that the term should equal zero as well, and Eq. (125) is reduced to 


OH 


Z 


on 


Let us see how all this machinery works for a simple but practically important case of a metallic- 
wall waveguide with a rectangular cross-section — see Fig. 22 


-=0. (7.126) 


¥ 


Fig. 7.22. A rectangular waveguide, and the 
ass transverse field distribution in its 


fundamental mode Ho (schematically). 


In the natural Cartesian coordinates shown in this figure, both Eqs. (101) take the simple form 


2 92 E., for E-modes, 
Ces, Sey (7.127) 
Ox” oy H,, for H - modes. 


Z 


From Chapter 2, we know that the most effective way of solution of such equations in a rectangular 
region is the variable separation, in which the general solution is represented as a sum of partial 
solutions of the type 


vanishes. As a result, the waves may “tunnel” through very narrow sections of metallic waveguides filled with 
such materials — see, e.g., M. Silveirinha and N. Engheta, Phys. Rev. Lett. 97, 157403 (2006). 
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f =X(x)Y(y). (7.128) 
Plugging this expression into Eq. (127), and dividing each term by XY, we get the equation, 
2 2 
te Sa (7.129) 
X dx Y dy 


which should be satisfied for all values of x and y within the waveguide’s interior. This is only possible 
if each term of the sum equals a constant. Taking the X-term and Y-term constants in the form (—k,”) and 
(-k), respectfully, and solving the corresponding ordinary differential equations,*> for the 
eigenfunction (128) we get 


f =(c, cosk,x+5, sink,x)(c, cosk,y+s,sink,y) with kK? +4? =k?, (7.130) 
where the constants c and s should be found from the boundary conditions. Here the difference between 
the H-modes and E-modes kicks in. 


For the H-modes, Eq. (130) is valid for H,, and we should use the boundary condition (126) on 
all metallic walls of the waveguide, i.e. at x = 0 and a; and y = 0 and b — see Fig. 22. As a result, we get 
very simple expressions for eigenfunctions and eigenvalues: 


(ae. =F, cos cos, (7.131) 
a b 
2 27/2 
an am 2 2 \i/2 nN m 
k,=—, k Ta that k, om =, PE, 7 i a eres : 7.132 
aL ky = sothat (k, Jan = (ke +4) a(2) (2) (7.132) 


where H, is the longitudinal field’s amplitude, and n and m are two integer numbers — each of them 
arbitrary besides that they cannot be equal to zero simultaneously.** Assuming, just for certainty, that a 
> b (as shown in Fig. 22), we see that the lowest eigenvalue of k, and hence the lowest cutoff frequency 
(123) are achieved for the so-called H\) mode with n = 1 and m = 0, and hence with 


a 


Depending on the a/b ratio, the second-lowest k, (and hence @,) belongs to either the H; mode 


with n= 1 andm=1: 
1 1 1/2 r 2 V2 
(Ait -a(S+5) -|1-(2) | (K,)10> (7.134) 


or to the Hx) mode with n = 2 and m=0: 


thus confirming our prior estimate of k;. 


55 Let me hope that the solution of equations of the type d’X/dx* + k,’X = 0 does not present any problem for the 
reader, at least due to their prior experience with problems such as standing waves on a guitar string, 
wavefunctions in a flat 1D quantum well, or (with the replacement x — #) a classical harmonic oscillator. 

56 Otherwise, the function H.(x,y) would be constant, so that, according to Eq. (121), the transverse components of 
the electric and magnetic field would equal zero. As a result, as the last two lines of Eqs. (100) show, the whole 
field would be zero for any k, # 0. 
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(j= = WEY: (7.135) 
a 


These values become equal at a/b = V3 ~ 1.7; in practical waveguides, the a/b ratio is not too far from 
this value. For example, in the standard X-band (~10-GHz) waveguide WR90, a = 2.3 cm (f, = @/22~ 
6.5 GHz), and b ~ 1.0 cm. 


Now let us have a look at the alternative E-modes. For them, we still should use the general 
solution (130) with f= E,, but now with the boundary condition (124). This gives us the eigenfunctions 
(E.),, = E,sin——sin~— , (7.136) 
a b 
and the same eigenvalue spectrum (132) as for the H modes. However, now neither ” nor m can be equal 
to zero; otherwise, Eq. (136) would give the trivial solution E{~x,y) = 0. Hence the lowest cutoff 
frequency of TM waves is achieved at the so-called £1; mode with n =1, m = 1, and with the eigenvalue 
given by Eq. (134), always higher than (k;)10. 


Thus the fundamental Hi) mode is certainly the most important wave in rectangular waveguides; 
let us have a better look at this field distribution. Plugging the corresponding solution (131) with n = 1 
and m = 0 into the general relation (121), we easily get 


ka 
(7,)i0 ak 
a 


H,sin—, (Hy =9, (7.137) 
a 


(E.)io =9, (Ey io 7H. sin=. (7.138) 
5s W a 


This field distribution is (schematically) shown in Fig. 22. Neither of the fields depends on the 
coordinate y — the feature very convenient, in particular, for microwave experiments with small samples. 
The electric field has only one (in Fig. 22, vertical) component that vanishes at the side walls and 
reaches its maximum at the waveguide’s center; its field lines are straight, starting and ending on wall 
surface charges (whose distribution propagates along the waveguide together with the wave). In 
contrast, the magnetic field has two non-zero components (1, and H-), and its field lines are shaped as 
horizontal loops wrapped around the electric field maxima. 


An important question is whether the Hio wave may be usefully characterized by a unique 
impedance introduced similarly to Zw of the TEM modes — see Eq. (115). The answer is not, because the 
main value of Zy is a convenient description of the impedance matching of a transmission line with a 
lumped load — see Fig. 19 and Eq. (118). As was discussed above, such a simple description is possible 
(i.e., does not depend on the exact geometry of the connection) only if both dimensions of the line’s 
cross-section are much less than 2. But for the Hi9 wave (and more generally, any non-TEM mode) this 
is impossible — see, e.g., Eq. (129): its lowest frequency corresponds to the TEM wavelength Amax = 
2 7(ki)min = 271/(k1)19 = 2a. (The reader is challenged to find a simple interpretation of this equality.) 


Now let us consider metallic-wall waveguides with a round cross-section (Fig. 23a). In this 
single-connected geometry, the TEM waves are impossible again, while for the analysis of H-modes and 
E-modes, the polar coordinates {o, vg} are most natural. In these coordinates, the 2D Helmholtz equation 
(101) takes the following form: 
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1 0 é Ty 202 8 E_, for E - modes, 
fa ae the I Os where f = (7.139) 
pop\ op) p 0p H,, for H -modes. 
Separating the variables as f= A(p)A¢Q), we get 
2 
a Gar : d are =0. (7.140) 
pRdp\ dp) p#dep 


But this is exactly the Eq. (2.127) that was studied in Sec. 2.7 in the context of electrostatics, just with a 
replacement of notation: y— k;. So we already know that to have 27-periodic functions (¢@) and finite 
values €(0) (which are evidently necessary for our current case — see Fig. 23a), the general solution 


must have the form given by Eq. (2.136), i.e. the eigenfunctions are expressed via integer-order Bessel 
functions of the first kind: 


fm =F, (Kim PXC, Cosng+s, sinng) =const x J, (k,,,p)cosn(~—@,), (7.141) 


with the eigenvalues k,,, of the transverse wave number k; to be determined from appropriate boundary 
conditions, and an arbitrary constant @. 


Pans (b) 


(a) 
Fig. 7.23. (a) Metallic and (b) dielectric 
waveguides with circular cross-sections. 


As for the rectangular waveguide, let us start from the H-modes (f = H,). Then the boundary 
condition on the wall surface (o = R) is given by Eq. (126), which, for the solution (141), takes the form 


Bs. See Fare: (7.142) 
dg 
This means that the eigenvalues of Eq. (139) are 
So. 
Cokes, 7.143 
t nm R ( ) 


where €’1m is the m™ zero of the function dJ,(8/dé. Approximate values of these zeros for several 
lowest n and m may be read out from Fig. 2.18; their more accurate values are given in Table | below. 


Table 7.1. Zeros €’,,, of the function d/,(6)/dé for a few lowest 
values of the Bessel function’s index n and the root’s number m. 


m=1 2 3 
n=0 3.83171 7.015587 10.1735 
1 1.84118 5.33144 8.53632 
2 3.05424 6.70613 9.96947 
3 4.20119 8.01524 11.34592 
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The table shows, in particular, that the lowest of the zeros is €’1; = 1.84.°” Thus, perhaps a bit 
counter-intuitively, the fundamental mode, providing the lowest cutoff frequency @ = Vknm, 1S His, 
corresponding to 7 = 1 rather than n = 0: 


H,=H,J, G 4 cos(p — Pp). (7.144) 


It has the transverse wave number is k; = ki; = &'1\/R ~ 1.84/R, and hence the cutoff frequency 
corresponding to the TEM wavelength Ana, = 27/k); = 3.41 R. Thus the ratio of Anax to the waveguide’s 
diameter 2R is about 1.7, i.e. is close to the ratio Amax/a = 2 for the rectangular waveguide. The origin of 
this proximity is clear from Fig. 24, which shows the transverse field distribution in the H\; mode. (It 
may be readily calculated from Eqs. (121) with E, = 0, and H, given by Eq. (144).) 


Fig. 7.24. Transverse field components in the 
fundamental H,, mode of a metallic, circular 
waveguide (schematically). 


One can see that the field structure is actually very similar to that of the fundamental mode in the 
rectangular waveguide, shown in Fig. 22, despite the different nomenclature (which is due to the 
different coordinate system used for the solution). However, note the arbitrary constant angle @, 
indicating that in circular waveguides, the transverse field’s polarization is arbitrary. For some practical 
applications, such degeneracy of these “quasi-linearly-polarized” waves creates problems; some of them 
may be avoided by using waves with circular polarization. 


As Table 1 shows, the next lowest H-mode is Hy, for which k; = k21 = €’2\/R ~ 3.05/R, almost 
twice larger than that of the fundamental mode, and only then comes the first mode with no angular 
dependence of any field, Hoi, with k; = ko: = €’01/R = 3.83/R,°8 followed by several angle-dependent 
modes: H31, Hy», etc. 


For the E modes, we may still use Eq. (141) (with f= E,), but with the boundary condition (124) 
at o = R. This gives the following equation for the problem eigenvalues: 
J (Kum R) = 0,5 i€. Ky = Bie (7.145) 


where é)m is the m" zero of function J,(é) — see Table 2.1. That table shows that the lowest k; is equal to 
&o1/R = 2.405/R. Hence the corresponding mode (£1), with no angular dependence of its fields, e.g. 


57 Mathematically, the lowest root of Eq. (142) with n = 0 equals 0. However, it would yield & = 0 and hence a 
constant field H,, which, according to the first of Eqs. (121), would give a vanishing electric field. 

58 The electric field lines in the Ho; mode (as well as all higher Ho, modes) are directed straight from the 
symmetry axis to the walls, reminding those of the TEM waves in the coaxial cable. Due to this property, these 
modes provide, at @ >> @, much lower energy losses (see Sec. 9 below) than the fundamental H,; mode, and are 
sometimes used in practice, despite the inconvenience of working in the multimode frequency range. 
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LoS At 2), (7.146) 


has the second-lowest cutoff frequency, ~30% higher than that of the fundamental mode Hj,. 


Finally, let us discuss one more topic of general importance — the number WN of electromagnetic 
modes that may propagate in a waveguide within a certain range of relatively large frequencies @>> @. 
It is easy to calculate for a rectangular waveguide, with its simple expressions (132) for the eigenvalues 
of {k,, k,}. Indeed, these expressions describe a rectangular mesh on the [k,, k,] plane, so that each point 
corresponds to the plane area AA; = (z/a)(a/b), and the number of modes in a large k-plane area A, >> 
AA; is N= A,/AA, = abA;/# = AA,/7’, where A is the waveguide’s cross-section area.>? However, it is 
frequently more convenient to discuss transverse wave vectors k, of arbitrary direction, i.e. with an 
arbitrary sign of their components A, and k,. Taking into account that the opposite values of each 
component actually give the same wave, the actual number of different modes of each type (E- or H-) is 
a factor of 27 = 4 lower than was calculated above. This means that the number of modes of both types is 


A,A 
(27)? 
Let me leave it for the reader to find hand-waving (but convincing :-) arguments that this mode 


counting rule is valid for waveguides with cross-sections of any shape, and any boundary conditions on 
the walls, provided that N >> 1. 


N=2 


(7.147) 


7.7. Dielectric waveguides, optical fibers, and paraxial beams 


Now let us discuss electromagnetic wave propagation in dielectric waveguides. The simplest, 
step-index waveguide (see Figs. 23b and 25) consists of an inner core and an outer shell (in the optical 
fiber technology lingo, called cladding) with a higher wave propagation speed, i.e. a lower index of 
refraction: 


V_>V 


+ =9 


LG. <Hin, Ky Shy Eye SSL: (7.148) 


+ 


at the same frequency. (In most cases the difference is achieved due to that in the electric permittivity, ¢ 
< ¢, while magnetically both materials are virtually passive: uw. * lu. * Mo, So that their refraction indices 
nx, defined by Eq. (84), are very close to (&/&) 7; I will limit my discussion to this approximation.) 


The basic idea of the waveguide’s operation may be readily understood in the limit when the 
wavelength 2 is much smaller than the characteristic size R of the core’s cross-section. In this 
“geometric-optics” limit, at distances of the order of 2 from the core-cladding interface, which provides 
the wave reflection, we can neglect the interface’s curvature and approximate its geometry with a plane. 
As we know from Sec. 4, if the angle 0 of the wave’s incidence on such a plane interface is larger than 
the critical value @, specified by Eq. (85), the wave is totally reflected. As a result, the waves launched 
into the fiber core at such “grazing” angles, propagate inside the core, being repeatedly reflected from 
the cladding — see Fig. 25. 


5° This formula ignores the fact that, according to the above analysis, some modes (with n = 0 and m = 0 for the H 
modes, and n = 0 or m= 0 for the E modes) are forbidden. However, for N >> 1, the associated corrections of Eq. 
(147) are negligible. 
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“cladding” €, 4, 


=> “core” €_,f_ | LL 0 i : eset 
Fig. 7.25. Wave propagation in 


3 a thick optical fiber at O> @. 


The most important type of dielectric waveguides is optical fibers.°° Due to a heroic 
technological effort over three decades starting from the mid-1960s, the attenuation of such fibers has 
been decreased from values of the order of 20 db/km (typical for a window glass) to the fantastically 
low values of ~0.2 db/km (meaning virtually perfect transparency of 10-km-long fiber segments!), 
combined with the extremely low plane-wave (“chromatic”) dispersion below 10 ps/km-nm.°®! In 
conjunction with the development of inexpensive erbium-based quantum amplifiers, this breakthrough 
has enabled inter-city and inter-continental (undersea), broadband®? optical cables, which are the 
backbone of all modern telecommunication infrastructure. 


The only bad news is that these breakthroughs were achieved for just one kind of materials 
(silica-based glasses)®? within a very narrow range of their chemical composition. As a result, the 
dielectric constants «= &/& of the cladding and core of practical optical fibers are both close to 2.2 (nz 
~ 1.5) and hence very close to each other, so that the relative difference of the refraction indices, 

1/2 1/2 
Fg ad TN le SR eet (7.149) 


n ee De: 


is typically below 0.5%. This factor limits the fiber bandwidth. Indeed, let us use the geometric-optics 
picture to calculate the number of quasi-plane-wave modes that may propagate in the fiber. For the 
complementary angle (Fig. 25) 


9=5-8. so that sind = cos 8, (7.150) 
Eq. (85) gives the following propagation condition: 
cos >t =1-A- (7.151) 
n 


60 For a comprehensive discussion of this vital technology see, e.g., A. Yariv and P. Yeh, Photonics, 6" ed., 
Oxford U. Press, 2007. 

6! Both these parameters have their best values not in the visible light range (with wavelengths from 380 to 740 
nm), but in the near-infrared, with the attenuation lowest between approximately 1,500 and 1,630 nm. As a result, 
most modern communication systems use two spectral windows — the so-called C-band (1,530-1,565 nm) and L- 
band (1,570-1,610 nm) within that range. 

62 Each of the spectral bands mentioned above, at a typical signal-to-noise ratio S/N > 10°, corresponds to the 
Shannon bandwidth Af log,(S/N) exceeding 10'* bits per second, some five orders of magnitude (!) higher than 
that of a modern Ethernet cable. The practically usable bandwidth of each fiber is somewhat lower, but a typical 
optical cable, with many fibers in parallel, has a proportionately higher aggregate bandwidth. A relatively recent 
(circa 2017) example is the C-band transatlantic (6,600-km-long) cable Marea, with eight fiber pairs and an 
aggregate useable bandwidth of 160 terabits per second. 

63 The silica-based fibers were developed in 1966 by an industrial research group led by Charles Kao (who shared 
the 2009 Nobel Prize in physics), but the very idea of using optical fibers for long-range communications may be 
traced back at least to the 1963 work by Jun-ichi Nishizawa — who also invented semiconductor lasers. 
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In the limit A << 1, when the incidence angles 0 > @ of all propagating waves are very close to 7/2, and 
hence the complementary angles are @small, we may keep only two first terms in the Taylor expansion 
of the left-hand side of Eq. (151) and get 

IH 2A. (7.152) 
(Even for the higher-end value A = 0.005, this critical angle is only ~0.1 radian, i.e. is close to 5°.) Due 
to this smallness, we may approximate the maximum transverse component of the wave vector as 


(K,)nax =H (Si D) spay Kay % VOKA , (7.153) 


max 


and use Eq. (147) to calculate the number N of propagating modes: 
; a =(kR)A : (7.154) 


For typical values k = 0.73x10’ m'' (corresponding to the free-space wavelength Ay = nA = 2am/k ~ 1.3 
um), R = 25 um, and A = 0.005, this formula gives N ~ 150. 


Now we can calculate the geometric dispersion of such a fiber, i.e. the difference in the mode 
propagation speed, which is commonly characterized in terms of the difference between the wave delay 
times (traditionally measured in picoseconds per kilometer) of the fastest and slowest modes. Within the 
geometric optics approximation, the difference in time delays of the fastest mode (with &, = 4) and the 
slowest mode (with k, = k sin@) at distance / is 


At = (2) = af = Dag = Eee) at -*:) eh, (7.155) 


y oO Oo v v n_ v 


Z 


For the example considered above, the TEM wave’s speed in the glass, v = c/n = 2x10° m/s, and the 
geometric dispersion A¢// is close to 25 ps/m, i.e. 25,000 ps/km. (This means, for example, that a 1-ns 
pulse, being distributed between the modes, would spread to a ~25-ns pulse after passing a just 1-km 
fiber segment.) This result should be compared with the chromatic dispersion mentioned above, below 
10 ps/km-nm, which gives dt// is of the order of only 1,000 ps/km in the whole communication band dA 
~ 100 nm. Due to this high geometric dispersion, such relatively thick (2R ~ 50 nm) multi-mode fibers 
are used for the transfer of signals over only short distances below ~ 100 m. (As compensation, they 
may carry relatively large power, beyond 10 mW, without being damaged by the field.) 


Long-range telecommunications are based on single-mode fibers, with thin cores (typically with 
diameters 2R ~ 5 ym, i.e. of the order of 4/A"”). For such structures, Eq. (154) yields N ~ 1, but in this 
case, the geometric optics approximation is not quantitatively valid, and for the fiber analysis, we should 
get back to the Maxwell equations. In particular, this analysis should take into explicit account the 
evanescent wave in the cladding, because its penetration depth may be comparable with R.“ 


64 The following quantitative analysis of the single-mode fibers is very valuable — both for practice and as a very 
good example of Maxwell equations’ solution. However, I have to confess that its results will not be used in the 
following parts of the course. So, if the reader is not interested in this topic, they may safely jump to the text 
following Eq. (181). (I believe that the discussion of the angular momentum of electromagnetic radiation, starting 
at that point, is compulsory for every professional physicist.) 
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Since the cross-section of an optical fiber lacks metallic walls, the Maxwell equations describing 
them cannot be exactly satisfied with either TEM-wave, or H-mode, or E-mode solutions. Instead, the 
fibers can carry the so-called HE and EH modes, with both vectors H and E having longitudinal 
components simultaneously. In such modes, both EF, and H, inside the core (o < R) have a form similar 
to Eq. (141): 


t= 6, (k, p)cosn(p—9,), where k? =k? —k? >0, and k? =@’e_u, (7.156) 


where the constant angles @ may be different for each field. On the other hand, for the evanescent wave 
in the cladding, we may rewrite Eqs. (101) as 


(V?-«?)f,=0, where x? =k?-k?>0, and k? =07e,y,. (7.157) 
Figure 26 illustrates these relations between k,, «;, k,, and k1; note that the following sum, 


(7.158) 


is fixed (at a given frequency) and, for typical fibers, is very small (<< k’). In particular, Fig. 26 shows 
that neither k; nor « can be larger than o[(é — Ex) po)” = (2A)"* k. This means that the depth 6= 1/K of 
the wave penetration into the cladding is at least L/k(2A)'? = A/2n(2A)'” >> A/2m This is why the 
cladding layers in practical optical fibers are made as thick as ~50 um, so that only a negligibly small 
tail of this evanescent wave field reaches their outer surfaces. 


ko — ke = @" (€_-€,) My 


Fig. 7.26. The relation between the transverse 
exponents k, and «x, for waves in optical fibers. 


In the polar coordinates, Eq. (157) becomes 


2 
+2 Gat To alfa (7.159) 
papi ap) p° og 


- the equation to be compared with Eq. (139) for the circular metallic-wall waveguide. From Sec. 2.7, 
we know that the eigenfunctions of Eq. (159) are the products of the sine and cosine functions of ng by 
a linear combination of the modified Bessel functions /, and K,, shown in Fig. 2.22, now of the 
argument «0. The fields have to vanish at ( — ©, so that only the latter functions (of the second kind) 
can participate in the solution: 


f. © K,(«,p) cosn(y — 9, ). (7.160) 


Now we have to reconcile Eqs. (156) and (160), using the boundary conditions at p = R for both 
longitudinal and transverse components of both fields, with the latter components first calculated using 
Eqs. (121). Such a conceptually simple, but a bit bulky calculation (which I am leaving for the reader’s 
exercise), yields a system of two linear, homogeneous equations for the complex amplitudes £; and H, 
which are compatible if 
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kod! ORI I eae ee 
a " ag Pg ee lg (7.161) 
k, ds K, K, k, iv, K, Kk R k, K, k, K 


t 


where the prime signs denote (as a rare exception in this series) the derivatives of each function over its 
full argument: k,p for J,, and « for K;. 


For any given frequency @, the system of equations (158) and (161) determines the values of k; 
and «;, and hence k,. Actually, for any n > 0, this system provides two different solutions: one 
corresponding to the so-called HE wave, with a larger ratio E./H., and the EH wave, with a smaller 
value of that ratio. For angular-symmetric modes with n = 0 (for whom we might naively expect the 
lowest cutoff frequency), the equations may be satisfied by the fields having just one non-zero 
longitudinal component (either E, or H,), so that the HE wave are the usual E-modes, while the EH 
modes are the H-waves. For the H-modes, the characteristic equation is reduced to the requirement that 
the expression in the second parentheses on the left-hand side of Eq. (161) is equal to zero. Using the 
Bessel function identities Jo’ =—J, and Ko’ = —Kj, this equation may be rewritten in a simpler form: 


1 J (k,R) _ 1 K\(x,R) 
k, Jy (k,R) K, K,(«,R) 


(7.162) 


Using the universal relation between k, and «% given by Eq. (158), we may plot both sides of Eq. 
(162) as functions of the same argument, say, ¢ = k,R — see Fig. 27. 


Fig. 7.27. Two sides of the characteristic 
equation (162), plotted as functions of £,R, 
for two values of its dimensionless 
parameter: “= 8 (blue line) and Y= 3 (red 
line). Note that according to Eq. (158), the 
argument of the functions Ko and K, is 

0 5 10 KR=(W-GRYYP aH - 2)”. 


The right-hand side of Eq. (162) depends not only on €, but also on the dimensionless parameter 
V defined as the normalized right-hand side of Eq. (158): 


WV? =@'(€_—€,)u)R? ~ 2A KR’. (7.163) 


(According to Eq. (154), if “>> 1, it gives twice the number N of the fiber modes — the conclusion 
confirmed by Fig. 27, taking into account that it describes only the H-modes.) Since the ratio K;/Ko is 
positive for all values of the functions’ argument (see, e.g., the right panel of Fig. 2.22), the right-hand 
side of Eq. (162) is always negative, so that the equation may have solutions only in the intervals where 
the ratio J;/Jp is negative, 1.e. at 


Cp Se RA) Cy RE ReGen. (7.164) 
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where &,m is the m-th zero of the function J,(€) — see Table 2.1. The right-hand side of the characteristic 
equation (162) diverges at mR — 0, 1.e. at KR — Y, so that no solutions are possible if “is below the 
critical value % = & ~ 2.405. At this cutoff point, Eq. (163) yields kz * &)/R(2A)'”. Hence, the cutoff 
frequency of the lowest H mode corresponds to the TEM wavelength 


_ 2a 
max a : 


For typical parameters A = 0.005 and R = 2.5 um, this result yields Amax ~ 0.65 um, corresponding to the 
free-space wavelength 2p ~ 1 um. A similar analysis of the first parentheses on the left-hand side of Eq. 
(161) shows that at A > 0, the cutoff frequency for the E modes is similar. 


A (2A)'* = 3.7RA'?., (7.165) 


This situation may look exactly like that in metallic-wall waveguides, with no waves possible at 
frequencies below @, but this is not so. The basic reason for the difference is that in the metallic 
waveguides, the approach to @ results in the divergence of the longitudinal wavelength A, = 2a/k,. On 
the other hand, in dielectric waveguides, the approach leaves A, finite (k, > k,). Due to this difference, a 
certain linear superposition of HE and EH waves with n = 1 can propagate at frequencies well below the 
cutoff frequency for n = 0, which we have just calculated.® This mode, in the limit ¢ * & (i.e. A << 1) 
allows a very interesting and simple description using the Cartesian (rather than polar) components of 
the fields, but still expressed as functions of the polar coordinates p and g. The reason is that this mode 
is very Close to a linearly polarized TEM wave. (Due to this reason, this mode is referred to as LP.) 


Let us select the x-axis parallel to the transverse component of the magnetic field vector at p= 0, 
so that E;|,-9 = 0, but Ey|,-0 # 0, and H,|,-0 40, but Hj|,-0 = 0. The only suitable solutions of the 2D 
Helmholtz equation (that should be obeyed not only by the z-components of the fields but also their x- 
and y-components) are proportional to Jo(k,p), with zero coefficients for EZ, and H,: 


E,=0, E,=E,Jy(k,p), H,=HyJ (kp), H,=0, forpsR. (7.166) 


Now we can use the last two equations of Eqs. (100) to calculate the longitudinal components of the 
fields: 
|. ve, k 1 OH, k 
= =-i—- EJ, (k p)sing, Ai.= *=-j-+H,J,(k, p)cosg, 7.167 
Lay k \(k,p)sin p eae aa i(k, P) cosp (7.167) 


where I have used the following mathematical identities: Jo = =/,’, 0p/0x = x/p = cosg, and dp/dy = y/p 
= sing. As a sanity check, we see that the longitudinal component or each field is a (legitimate!) 
eigenfunction of the type (141), with n = 1. Note also that if k, << k, (this relation is always true if A << 
1 — see either Eq. (158) or Fig. 26), the longitudinal components of the fields are much smaller than their 
transverse counterparts, so that the wave is indeed very close to the TEM one. Because of that, the ratio 
of the electric and magnetic field amplitudes is also close to that in the TEM wave: Eo/Ho = Z_% Z,. 


Now to satisfy the boundary conditions at the core-to-cladding interface (9 = R), we need to have 
a similar angular dependence of these components at o > R. The longitudinal components of the fields 


65 This fact becomes less surprising if we recall that in the circular metallic waveguide, discussed in Sec. 6, the 
fundamental mode (1, see Fig. 23) also corresponded to n = | rather than n = 0. 
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are tangential to the interface and thus should be continuous. Using the solutions similar to Eq. (160) 
with n = 1, we get 
k, I AK,R) 


k, J\(k,R : 
Bag De Gap sings: Ha 


H,K («,p)cosg, for p= R. (7.168 
Ok Ky(«,R) La oO eee 


For the transverse components, we should require the continuity of the normal magnetic field wH,,, for 
our simple field structure equal to just wH,cos@, of the tangential electric field E,= E,sing, and of the 
normal component of D, = é£, = éE,cosg. Assuming that 4. = 44. = lo, and €, » €_% we can satisfy these 
conditions with the following solutions: 


Jobe) 
* Ky (k,R) 
0 t 


— J ((K,R) 


FE, Ko(K,p), a K (k R) 
Or 


HK \(«,p), H,=90, foro2R. (7.169) 


From here, we can calculate components from £, and H-, using the same approach as for p< R: 
| Sy sm JR) 
= Ss 1 
° ik, Oy k, Ky(«,R) 
1 OH, .K, J,(k,R) 


= =-i— H,K,(x,p)cosg, for p= R. 
ae ie” Eka ee Oe 


EK («,p)sin 9, 
(7.170) 


These relations provide the same functional dependence of the fields as Eqs. (167), i.e. the internal and 
external fields are compatible, but their amplitudes at the interface coincide only if 


k J\(k,R) a K,(k,R) 


: =k, (7.171) 
J (k,R) | K,(k,R) 


This characteristic equation (which may be also derived from Eq. (161) with n = 1 in the limit A 
— 0) looks close to Eq. (162), but functionally is much different from it — see Fig. 28. Indeed, its right- 
hand side is always positive, and the left-hand side tends to zero at k,R — 0. As a result, Eq. (171) may 
have a solution for arbitrary small values of the parameter / defined by Eq. (163), i.e. for arbitrary low 
frequencies (large wavelengths). This is why this mode is used in practical single-mode fibers: there are 
no other modes with wavelengths larger than the 2nax given by Eq. (165), so that they cannot be 
unintentionally excited on small inhomogeneities of the fiber. 


It is easy to use the Bessel function approximations by the first terms of the Taylor expansions 
(2.132) and (2.157) to show that in the limit “> 0, «R tends to zero much faster than k,R ~ % mR -> 
2exp{-1/ W} << Y. This means that the scale pe. = 1/« of the radial distribution of the LPo; wave’s fields 
in the cladding becomes very large. In this limit, this mode may be interpreted as a virtually TEM wave 
propagating in the cladding, just slightly deformed (and guided) by the fiber’s core. The drawback of 
this feature is that it requires very thick cladding, to avoid energy losses in its outer (“buffer” and 
“Jacket’”) layers that defend the silica layers from the elements and mechanical damages, but lack their 


66 This is the core assumption of this approximate theory, which accounts only for the most important effect of the 
small difference of the dielectric constants ¢, and &: the difference between (k? —k,) =k? >0 and (k? —k/) =— 
«; < 0. For more discussion of the accuracy of this approximation and some exact results, the interested reader 
may be referred either to the monograph by A. Snyder and D. Love, Optical Waveguide Theory, Chapman and 
Hill, 1983, or to Chapter 3 and Appendix B in the monograph by Yariv and Yeh, that was cited above. 
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low optical absorption. Due to this reason, the core radius is usually selected so that the parameter / is 
just slightly less than the critical value %= &, 2.4 for higher modes, thus ensuring the single-mode 
operation. 


10 


Fig. 7.28. Two sides of the 
characteristic equation (171) for the 
LP», mode, plotted as a function of 
k.R, for two values of the 
dimensionless parameter: WY = 8 
(blue line) and Y= 1 (red line). 


In order to reduce the field spread into the cladding, the step-index fibers discussed above may 
be replaced with graded-index fibers whose dielectric constant ¢ is gradually and slowly decreased from 
the center to the periphery.°’ Keeping only the main two terms in the Taylor expansion of the function 
&(p) at p = 0, we may approximate such reduction as®8 


e(p) = e(0)1- ep"), (7.172) 


where C= [(a@ ddp 2], is a positive constant characterizing the fiber composition gradient. 
Moreover, if this constant is sufficiently small (¢ << #’), the field distribution across the fiber’s cross- 
section may be described by the same 2D Helmholtz equation (101), but with a space-dependent 


transverse wave vector:°? 
Iv? +42(p)]f =0, 


where (7.173) 
k? (p) =k? (p)-k2 =k? (0)-k’(0)G’, and k?(0)= @e(0)u. 

Surprisingly for such an axially-symmetric problem, because of its special dependence on the radius, 

this equation may be most readily solved in the Cartesian coordinates. Indeed, rewriting it as 


242 2 O-KPO? +y)/F=0, (7.174) 
Ox” oy 


and separating the variables as f= X(x) Y(y), we get 


67 Due to the difficulty of fabrication of graded-index fibers with wave attenuation below a few dm/km, they are 
not used as broadly as the step-index ones. 

68 For an axially-symmetric smooth function «(p), the first derivative de/dp always vanishes at p = 0, so that Eq. 
(172) does not have a term linear in p. 

69 This approach is invalid at arbitrary (large) ¢ because in the macroscopic Maxwell equations, «(r) is under the 
differentiation sign, and the exact Helmholtz-type equations for fields have additional terms containing Ve. 
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a een ee 
+ +k? (0)—k*(0 7+ y")=0, TAVIS 
so that the functions X and Y obey similar differential equations, for example 
2. 
a> +(e — k?(0)&x? |x = 0, (7.176) 
x 


with the separation constants satisfying the following condition: 
ko +k? =k? (0) =k?(0)-K?. (7.177) 


The ordinary differential equation (176) is well known from quantum mechanics, because the 
stationary Schrédinger equation for one of the most important basic quantum systems, a 1D harmonic 
oscillator, may be rewritten in this form. Its eigenvalues are very simple: 


(k2), =K(0)E"7(2n+1), (KZ), =K(0)E"7(2m+1), with n,m=0,1,2,..., (7.178) 


but the corresponding eigenfunctions X,(x) and Y,,(y) are expressed via not quite elementary functions — 
the Hermite polynomials.’ For most practical purposes, however, the lowest eigenfunctions Xo(x) and 
Yo(y) are sufficient, because they correspond to the lowest k,.,, and hence the lowest 


Lk? Nein = &2)o + K2)o = 2k(0)E"?, (7.179) 


and the lowest cutoff frequency. As may be readily verified by the substitution to Eq. (176), the 
eigenfunctions corresponding to this fundamental mode are also simple: 


1/22 
X, (x) = const x ex - ie (7.180) 
and similarly for Yo(v), so that the field distribution follows the Gaussian function 
k 0 1/2 2 2 ; 
fi(p) = {00} ( ee pP | = f(er9- 2 with a=1/k'?(0)C""4, (7.181) 


where a >> 1/k(0) has the sense of the effective width of the field’s extension in the radial direction, 
normal to the wave’s propagation axis z. This is the so-called Gaussian beam, very convenient for some 
applications. 


The Gaussian beam (181) is just one example of the so-called paraxial beams, which may be 
represented as a result of modulation of a plane wave with a wave number &, by an axially-symmetric 
envelope function flip), where p = {x, vy}, with a relatively large effective radius a >> 1/k.7! Such beams 
give me a convenient opportunity to deliver on the promise made in Sec. 1: calculate the angular 
momentum L of a circularly polarized wave propagating in free space, and prove its fundamental 
relation to the wave’s energy U. Let us start from the calculation of U for a paraxial beam (with an 


70 See, e.g., QM Sec. 2.9. 
7! Note that propagating in a uniform medium, i.e. outside of grade-index fibers or other focusing systems, such 
beams gradually increase their width a due to diffraction — the effect to be analyzed in the next chapter. 
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arbitrary, but spatially-localized envelope /) of a circularly polarized wave, with the transverse electric 
field components given by Eq. (19): 


E,=E,f(p)cosy, E, = FE, f(p)siny, (7.182a) 


where Ep is the real amplitude of the wave’s electric field at the propagation axis, y= kz— at + @is its 

total phase, and the two signs correspond to two possible directions of the circular polarization.’ 

According to Eq. (6), the corresponding transverse components of the magnetic field are 

E ; E 

H, = +S f(p)siny, H,= f(p)cosy. (7.182b) 
0 0 


These expressions are sufficient to calculate the energy density (6.113) of the wave,” 


é\E; +E; Hts), Sef ee 
_ mi x 2) , Hol x 2) e of fo of =¢,F2f?, (7.183) 
2 2 2 2Z) 


and hence the full energy (per unit length in the direction z of the wave’s propagation) of the beam: 


U = [ud?r = 2n[updp = 2ne,E, | f pdp. (7.184) 
0 0 


However, the transverse fields (182) are insufficient to calculate a non-zero average of L. 
Indeed, following the angular moment’s definition in mechanics,’* L = rxp, where p is the particle’s 
(linear) momentum, we may use Eq. (6.115) for the electromagnetic field momentum’s density g in free 
space, to define the field’s angular momentum’s density as 


(7.185) 


> (n,[£.(r-H)-#(r-E)|+|E,(r-H)-H,(r-E)]}}. (7.186) 

If the field is purely transverse (EF, = H, = 0), as it is in a strictly plane wave, the first square 
brackets in the last expression vanish, while the second bracket gives an azimuthal component of I, 
which oscillates in time and vanishes at its time averaging. (This is exactly the reason why I have not 
tried to calculate L during our first discussion of the circularly polarized waves in Sec. 1.) 


72 For our task of calculating two quadratic forms of the fields (L and U), their real representation (182) is more 
convenient than the complex-exponent one. However, for /inear manipulations, the latter representation of the 
circularly-polarized waves, E,= Eof(p)Re[(n, + in, )exp {iy}], H,= (Eo/Zo)f(e)Re[(Fin, + n,)exp{iy}], is usually 
more convenient, and is broadly used. 

73 Note that, in contrast to a linearly-polarized wave (16), the energy density of a circularly-polarized wave does 
not depend on the full phase y— in particular, on ¢ at fixed z, or vice versa. This is natural because its field vectors 
rotate (keeping their magnitude) rather than oscillate — see Fig. 3b. 

74 See, e.g., CM Eq. (1.31). 

75 See, e.g., MA Eq. (7.5). 
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Fortunately, our discussion of optical fibers, in particular, the derivation of Eqs. (167), (168), and 
(170), gives us a clear clue on how to resolve this paradox. If the envelope function f(p) differs from a 
constant, the transverse wave components (182) alone do not satisfy the Maxwell equations (2b), which 
necessitates longitudinal components £, and H, of the fields, with’® 
OE, OF, OE, OH OH, OH, 


= RR ait eet ccd lit (7.187) 
Oz Ox oy Oz Ox Oy 


However, as these expressions show, if the envelope function f changes very slowly in the sense df/dp ~ 
fla << kf, the longitudinal components are very small and do not have a back effect on the transverse 
components. Hence, the above calculation of U is still valid (asymptotically, at ka — 0), and we may 
still use Eqs. (182) on the right-hand side of Eqs. (187), 


OE, = £,[-Loosy x Lsiny } a Z, AG Tiny -Leosy], (7.188) 


and integrate them over z as 
E,= B,|{-Loosy +L siny Ja 7 4(-L foosyay © fsiny ay) 
0 ox Ox 


k 
=9(-2 of 


(7.189a) 
sin -o COS 
EL ee Be v. 


Here the integration constant is taken for zero because no wave field component may have a time- 
independent part. Integrating, absolutely similarly, the second of Eqs. (188), we get 


oh. of 
H + 7.189b 
iz, aC ies aie sin v| ( ) 


With the same approximation, we may calculate the longitudinal (z—) component of 1, given by 
the first term of Eq. (186), keeping only the dominating, transverse fields (182) in the scalar products: 
I, = E,(r-H,)—H,(r-E,)= E,(xH, + yH, )-H,(xE, + yE,). (7.190) 


Plugging in Eqs. (182) and (189), and taking into account that in free space, k = a/c, and hence 1/Zyc*k 
= &/@, we get: 


1, = #2 [Zs yZ) ants a LA Nar p-v(f?)= Fe pill ) sy 


Hence the total angular momentum of the beam (per unit length), is 


20 2 © 2 2 p=0 
L, =[ld?r= 2{ l.edp =n = Jr" ay Vs =n ae dear) (7.192) 


Taking this integral by parts, with the assumption that pf > 0 at p > 0 and p > ~ (at it is true for the 
Gaussian beam (181) and all realistic paraxial beams), we finally get 


76 The complex-exponential versions of these equalities are given by the bottom line of Eq. (100). 
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i = tn S80 f p2a(p?) = 420 fp pap. (7.193) 
@ @ 0 


Now comparing this expression with Eq, (184), we see that remarkably, the ratio L,/U does not 
depend on the shape and the width of the beam (and of course on the wave’s amplitude Eo), so these 
parameters are very simply and universally related: 


(7.194) 


Since this relation is valid in the plane-wave limit a — ©, it may be attributed to plane waves as well, 
with the understanding that in real life they always have some width (“aperture’’) restriction. 


As the reader certainly knows, in quantum mechanics the energy excitations of any harmonic 
oscillator of frequency @ are quantized in the units of #@, while the internal angular momentum of a 
particle is quantized in the units of si, where s is its spin. In this context, the classical relation (194) is 
used in quantum electrodynamics as the basis for treating the electromagnetic field excitation quanta 
(photons) as some sort of quantum particles with spin s = 1. (Such integer spin also fits the Bose- 
Einstein statistics of the electromagnetic radiation.) 


Unfortunately, I do not have time/space for a further discussion of the (very interesting) physics 
of paraxial beams but cannot help noticing, at least in passing, the very curious effect of helical waves — 
the beams carrying not only the “spin” momentum (194), but also an additional “orbital” angular 
momentum. The distribution of their energy in space is not monotonic, as it is in the Gaussian beam 
(181), but reminds several threads twisted around the propagation axis — hence the term “helical’.77 
Mathematically, their field structure is described by the associate Laguerre polynomials — the same 
special functions that are used for the quantum-mechanical description of hydrogen-like atoms.’® 
Presently, there are efforts to use such beams for the so-called orbital angular momentum (OAM) 
multiplexing for high-rate information transmission.7? 


7.8. Resonant cavities 


Generally, resonators are structures that may sustain oscillations (in electrodynamics, of the 
electromagnetic field) even without an external source, until the oscillation amplitude slowly decreases 
in time due to unavoidable energy losses. If the resonator quality (described by the so-called Q-factor, 
which will be defined and discussed in the next section) is high, O >> 1, this decay takes many 
oscillation periods. Alternatively, high-O resonators may sustain high oscillating fields permanently, if 
driven by relatively weak incident waves. In contrast to lumped-element resonators, say, the well-known 
LC tank circuit, the subject of this section is resonant cavities (or “distributed resonators’) limited by 
either conducting or dielectric walls that contain distributed standing waves inside them. 


77 Noticing such solutions of the Maxwell equations may be traced back to at least a 1943 theoretical work by J. 
Humblet; however, this issue had not been discussed in literature too much until experiments carried out in 1992 — 
see, e.g. L. Allen et al., Optical Angular Momentum, IOP, 2003. 

78 See, e.g., QM Sec. 3.7. 

79 See, e.g., J. Wang et al., Nature Photonics 6, 488 (2012). 
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Conceptually the simplest resonant cavity is the Fabry-Pérot interferometer®® that may be 
obtained by placing two well-conducting planes parallel to each other.*! Indeed, in Sec. 3 we have seen 
that if a plane wave is normally incident on such a “perfect mirror’, located at z = 0, its reflection, at 
negligible skin depth, results in a standing wave described by Eq. (61b): 


E(z,t) = ReQE,e~! Ones )sin kz. (7.195) 


This wave would not change if we place the second mirror (isolating the segment of length / from the 
external wave source) at any position z = / with sin A/ = 0, 1.e. with 


kl= px, where p=1,2..... (7.196) 


This condition, which determines the spectrum of own (or resonance, or eigen-) frequencies of the 
resonator of fixed length /, 


ee a (7.197) 


; (eal 

has a simple physical sense: the resonator’s length / equals exactly p half-waves of the frequency @. 
Though this is all very simple, please note a considerable change of philosophy from what we have been 
doing in the previous sections: the main task of the resonator’s analysis is finding its own frequencies 
@,, which are now determined by the system’s geometry rather than by an external wave source. 


Before we move to cavities of more complex shapes, let us use Eq. (62) to represent the 
magnetic field in the Fabry-Pérot interferometer: 


E > 
H(z,t) =Re( 222 " Jeoske (7.198) 


Expressions (195) and (198) show that in contrast to traveling waves, each field of the standing wave 
changes simultaneously (proportionately) at all points of the Fabry-Pérot resonator, turning to zero 
everywhere twice a period. At these instants, the energy of the corresponding field vanishes, but the 
total energy of the two fields stays constant because the counterpart field oscillates with the phase shift 
m2. Such behavior is typical for all electromagnetic resonators. 


A more technical remark is that we can readily get the same results (195)-(198) by solving the 
Maxwell equations from scratch. For example, we already know that in the absence of dispersion, 
losses, and sources, they are reduced to wave equations (3) for any field components. For the Fabry- 
Pérot resonator’s analysis, we can use the 1D form of these equations, say, for the transverse component 


of the electric field: 
Co the 
ao |S 6) 7.199 

e v 2] ( ) 


and solve it as a part of an eigenvalue problem with the corresponding boundary conditions. Indeed, by 
separating time and space variables as E(z, t) = Z(z)7(), we obtain 


80 This device, named after its inventors, Charles Fabry and Alfred Pérot; is also called the Fabry-Pérot etalon 
(meaning “gauge”’), because of its initial usage for light wavelength measurements. 
8! The resonators formed by well conducting (usually, metallic) walls are frequently called resonant cavities. 
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1d’Z lide 
=0. 7.200 
Lae eT ae ( ) 
Calling the separation constant k’, we get two similar ordinary differential equations, 


ar 


1? 


+k-v 7 =0, (7.201) 


both with sinusoidal solutions, so that the product Z(z) (ft) is a standing wave with the wave vector & and 
frequency w= kv. (In this form, the equations are valid even in the presence of dispersion, but with a 
frequency-dependent wave speed: r= 1/e&(@)4(@).) Now using the boundary conditions E(0, t) = E(/, t) 
= 0,8? we get the eigenvalue spectrum for k, and hence for @, = vk», given by Eqs. (196) and (197). 


Lessons from this simple case study may be readily generalized to any cavity formed as a 
transmission line’s section:83 there are two approaches to finding the resonant frequency spectrum: 


(i) We may look at a traveling wave solution and find where reflecting mirrors may be inserted 
without affecting the wave’s structure. 


(11) We may solve the general 3D wave equations, 
[v: LS ren = (7.202) 


for field components, as an eigenvalue problem with appropriate boundary conditions. If the system’s 
parameters (and hence the coefficient v) do not change in time, the spatial and temporal variables of Eq. 
(202) may be always separated by taking 


fT D=VAM70, (7.203) 


where each function 7;(¢) always obeys the same equation as in Eq. (201), having the sinusoidal solution 


of frequency @ = vk. Plugging this solution back into Eqs. (202), for the spatial distribution of the field, 
we get the 3D Helmholtz equation, 


(V2 +k7)p(r) =0, (7.204) 
whose eigenfunctions #(r) may be much more involved, especially for non-symmetric geometries. 


Let us use these approaches to find the resonant frequency spectrum of a few simple, but 
practically important cavities. First of all, the first method is completely sufficient for the analysis of any 
resonator formed as a fragment of a uniform TEM transmission line (e.g., a coaxial cable), confined 
with two conducting lids normal to the line’s direction. Indeed, since in such lines k, = k = a/v, and the 
electric field is perpendicular to the propagation axis, e.g., parallel to the lid surface, the boundary 
conditions are exactly the same as in the Fabry-Pérot resonator, and we again arrive at the 
eigenfrequency spectrum (197). 


82 This is of course the expression of the first of the general boundary conditions (104). The second of these 
conditions (for the magnetic field) is satisfied automatically for the transverse waves we are considering. 

83 The resonators may have different geometries as well, and in many cases, only the second approach may be 
used. 


Chapter 7 Page 56 of 68 


3D 
Helmholtz 
equation 


Essential Graduate Physics EM: Classical Electrodynamics 


Now let us analyze a slightly more complex system: a rectangular metallic-wall cavity of volume 
axbxl — see Fig. 29. To use the first approach outlined above, let us consider the resonator as a finite- 
length (Az = /) section of the rectangular waveguide extended along the z-axis, which was analyzed in 
detail in Sec. 6. As a reminder, at a < b, in the fundamental Hj traveling wave mode, both vectors E and 
H do not depend on y, with E having only a y-component. In contrast, H has two components, H, and 
H,, with the phase shift 7/2 between them, and with H, having the same phase as E£, — see Eqs. (131), 
(137), and (138). Hence, if a plane perpendicular to the z-axis, is placed so that the electric field 
vanishes on it, H, also vanishes, so that both boundary conditions (104), pertinent to a perfect metallic 
wall, are fulfilled simultaneously. 


Fig. 7.29. Rectangular metallic-wall resonator 
as a finite section of a waveguide with the 
cross-section shown in Fig. 22. 


As a result, the Hi9 wave would not be perturbed by two metallic walls separated by an integer 
number of half-wavelengths 4,/2 corresponding to the wave number given by the combination of Eqs. 


(102) and (133): ; ' 
k =(k? -K?)” -(%-2 (7.205) 


t 
v a 


Using this expression, we see that the smallest of these distances, / = 1,/2 = ak, gives the resonance 


frequency*4 
1/2 


M01 -|(2] (2) | , (7.206) 
a l 


where the indices of @ show the numbers of half-waves along each dimension of the system, in the order 
[a, b, 1]. This is the lowest (“fundamental’’) frequency of the resonator (if b <a, /). 


The field distribution in this mode is close to that in the corresponding waveguide mode Ho 
(Fig. 22), with the important difference that the magnetic and electric fields are now shifted by phase 7/2 
both in space and time, just as in the Fabry-Pérot resonator — see Eqs. (195) and (198). Such a time shift 
allows for a very simple interpretation of the Hijo: mode, which is especially adequate for very flat 
resonators, with b << a, /. At the instant when the electric field reaches its maximum (Fig. 30a), 1.e. 
when the magnetic field vanishes in the whole volume, the surface electric charge of the broadest (in 
Fig. 30, horizontal) walls of the resonator is largest, being localized mostly near the centers of the walls. 
At the immediate later times, the walls start to recharge via surface currents, whose density J is largest 
in the side walls, and reaches its maximal value in a quarter of the oscillation period 7= 27/q@01 — see 


Fig. 30b. The currents generate the vortex magnetic field, with looped field lines in the plane of the 


84 Tn most electrical engineering handbooks, the index corresponding to the shortest side of the resonator is listed 
last, so that the fundamental mode is nominated as Ho and its eigenfrequency as @)0. 


Chapter 7 Page 57 of 68 


Essential Graduate Physics EM: Classical Electrodynamics 


broadest face of the resonator. The surface currents continue to flow in this direction until (in one more 
quarter period) the broader walls of the resonator are fully recharged in the polarity opposite to that 
shown in Fig. 30a. After that, the surface currents start to flow in the direction opposite to that shown in 
Fig. 30b. This process, which repeats again and again, is conceptually similar to the well-known 
oscillations in a lumped LC circuit, with the role of (now, distributed) capacitance played mostly by the 
broadest walls of the resonator, and that of (now, distributed) inductance, by its narrower walls. 


(b) 


Fig. 7.30. Fields, charges, and 
currents in the fundamental (01) 
mode of a rectangular metallic 
resonator, at two instants separated 
by At= z/2.@,0, — schematically. 


In order to generalize Eq. (206) to higher oscillation modes, the second of the approaches 
discussed above is more prudent. Separating the variables in the Helmholtz equation (204) as €(r) = 
X(x)Y(y)Z(z), we see that X, Y, and Z have to be either sinusoidal or cosinusoidal functions of their 
arguments, with the wave vector components satisfying the characteristic equation 

2 
4k +R ae, (7.207) 
Vv 
In contrast to the wave propagation problem, now we are dealing with standing waves along all three 
dimensions, and have to satisfy the macroscopic boundary conditions (104) on all sets of parallel walls. 
It is straightforward to check that these conditions (EF, = 0, H,, = 0) are fulfilled at the following field 
component distribution: 


E, = E,cosk,x sink,y sink,z, H, = H,sink,x cosk,y cosk,z, 
E, = E,sink,x cosk,y sink,z, H, =H, cosk,x sink, y cosk,z, (7.208) 
E, =E,sink,x sink, y cosk,z, H, = H,cosk,x cosk,y sink, z, 


with each of the wave vector components having an equidistant spectrum similar to Eq. (196): 
k.=—, k,=—, k,=—, (7.209) 
so that the full spectrum of resonance frequencies is given by the following formula: 
1/2 
Orn m= () (=) (2) ] (7.210) 


which is a natural generalization of Eq. (206). Note, however, that of the three integers m, n, and p, at 
least two have to be different from zero to keep the fields (208) from vanishing at all points. 


We may use Eq. (210), in particular, to evaluate the number of different modes in a relatively 
small range d°k << of the wave vector space volume that is, on the other hand, much larger than the 
reciprocal volume, 1/V = 1/abl, of the cavity. Taking into account that each eigenfrequency (210), with 
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nml # 0, corresponds to two field modes with different polarizations,®> the argumentation absolutely 
similar to the one used for the 2D case at the end of Sec. 7, yields 


(7.211) 


This property, valid for resonators of arbitrary shape, is broadly used in classical and quantum statistical 
physics,®° in the following form. If some electromagnetic mode functional f(k) is a smooth function of 
the wave vector k, and the volume V is large enough, then Eq. (211) may be used to approximate the 
sum of the functional’s values over the modes by an integral: 


dN 
d°k 


> f(k) & | f(k)dN = | f(k) ak = a | f(k)d?k. (7212) 


Leaving similar analyses of resonant cavities of some other simple shapes for the reader’s 
exercises, let me finish this section by noting that low-loss resonators may be also formed by finite- 
length sections of not only metallic-wall waveguides of various cross-sections but also of dielectric 
waveguides. Moreover, even a simple slab of a dielectric material with a s/¢ ratio substantially different 
from that of its environment (say, of the free space) may be used as a high-Q Fabry-Pérot interferometer 
(Fig. 31), due to an effective wave reflection from its surfaces at the normal and especially an inclined 
incidence — see, respectively, Eqs. (68), and Eqs. (91) and (95). 


E>E, 


Fig. 7.31. A dielectric Fabry-Pérot interferometer. 
<—> 


d~a 


Actually, such dielectric Fabry-Pérot interferometers are frequently more convenient for 
practical purposes than metallic-wall resonators, not only due to possibly lower losses (especially in the 
optical range), but also due to a natural coupling to the environment, which offers a ready way of wave 
insertion and extraction — see Fig. 31 again. The backside of the same medal is that this coupling to the 
environment provides an additional mechanism of power losses, limiting the resonance’s quality factor — 
see the next section. 


7.9. Energy loss effects 


The inevitable energy losses (“dissipation”) in passive media lead, in two basic situations, to two 
different effects. In a long transmission line fed by a constant wave source, the losses lead to a gradual 


85 This fact becomes evident from plugging Eqs. (208) into the Maxwell equation V-E = 0. The resulting 
equation, k,E, + k,E, + k,E3; =0, with the discrete, equidistant spectrum (209) for each wave vector component, 
may be always satisfied by two linearly-independent sets of the constants E> 3. 

86 See, e.g., QM Sec. 1.1 and SM Sec. 2.6. 
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attenuation of the wave, i.e. to a decrease of its amplitude, and hence its power “ with the growing 
distance z from the source. In linear materials, the power losses are proportional to the power carried 
by the wave, so that the energy balance on a small segment dz takes the form 


AR 


loss 


dP 
=—dP= a = aSdz. (7.213) 
VA 


The coefficient @ participating in the last form of Eq. (213), and hence defined as 


poe (7.214) 
Pp 


is called the attenuation constant.87 Comparing the solution of Eq. (213), 


P(z)=P(O)e@ , (7.215) 


with Eq. (29), where k is replaced with k,, we see that a@ may be expressed as 
a=2Imk,, (7.216) 


where k, is the component of the wave vector along the transmission line. In the most important limit 
when the losses are low in the sense a << | k. | ~ Re kz, its effects on the field distribution along the 
line’s cross-section are negligible, making the calculation of @ rather straightforward. In particular, in 
this limit, the contributions to attenuation from two major sources, the energy losses in the filling 
dielectric and the skin-effect losses in conducting walls, are independent and additive. 


The dielectric losses are especially simple to describe. Indeed, a review of our calculations in 
Secs. 5-7 shows that all of them remain valid if either &@), or (@), or both, and hence k(@), have small 
imaginary parts: 


i= olmle'?(o)u"'?(o)| a (7.217) 
In TEM transmission lines, 4, = k, and hence Eq. (216) yields 


= 2k" = 20 Ile"? (o)u'?(o)]. (7.218) 


Qs 


lling 


For dielectric waveguides, in particular optical fibers, these losses are the main attenuation mechanism. 
As was discussed in Sec. 7, in practical optical fibers «mR >> 1, i.e. most of the field propagates (as an 
evanescent wave) in the cladding, with a field distribution very close to the TEM wave. This is why Eq. 
(218) is approximately valid if it is applied to the cladding material alone. In waveguides with non- 
TEM waves, we can use the relations between k, and k, derived in the previous sections, to re-calculate 
k” into Im k,. (Note that at this recalculation, the values of k, have to be kept real, because they are just 
the eigenvalues of the Helmholtz equation (101), which does not include the filling media parameters.). 


87 In engineering, attenuation is frequently measured in decibels per meter, abbreviated as db/m (the term not to 
be confused with dBm standing for decibel-milliwatt): 
P(z) @|1/m 10 
| db/m ME Feet ee | Ia, 


1m aT, [m“]~ 4.34a[m"]. 
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In transmission lines and waveguides and with metallic walls, higher energy losses may come 
from the skin effect. If the wavelength 4 is much larger than 6, as it usually is,88 the losses may be 
readily evaluated using Eq. (6.36): 

Bee os ae pan, 
dA dA 4 


where Ayan is the real amplitude of the tangential component of the magnetic field at the wall’s surface. 
The total power loss Aposs/dz per unit length of a waveguide, i.e. the right-hand side of Eq. (213), now 
may be calculated by the integration of this expression along the contour(s) limiting the cross-section of 
all conducting walls. Since our calculation is only valid for low losses, we may ignore their effect on the 
field distribution, so that the unperturbed distributions may be used both in Eq. (219), i.e. in the 
numerator of Eq. (214), and also for the calculation of the average propagating power, i.e. the 
denominator of Eq. (214) — as the integral of the Poynting vector over the cross-section of the 
waveguide. 


(7.219) 


Let us see how this approach works for the TEM mode in one of the simplest transmission lines, 
the coaxial cable (Fig. 20). As we already know from Sec. 5, in the coarse-grain approximation, 
implying negligible power loss, the TEM mode field distributions between the two conductors are the 
same as in statics, namely: 


a 
H, =0, H., =; He(p)= Ho» (7.220) 
where Hp is the field’s amplitude on the surface of the inner conductor, and 


7 


1/2 
E. =0, E,(p) = ZH4(p) = ZH, E,=0, where z=(#) (7.221) 


Neglecting the power losses for a minute, we may plug these expressions into Eq. (42) to calculate the 
time-averaged Poynting vector: 


2 2 2 
— ZH Z\H 

S= F6(0) 2 Aol{ a (7.222) 

2; 2 p 

and from it, the total wave power flow through the cross-section: 

ja Vie ce 
Pp = [5a°r = Z i 2n | PP = aZ\H, | a? n=. (7.223) 
a P q 


Next, for this particular system (Fig. 20), the contours limiting the wall cross-section are circles 
of radii 9 = a (where the surface field amplitude Hwan; equals, in our notation, Ho), and p = b (where, 
according to Eq. (214), the field is a factor of b/a lower). As a result, for the power loss per unit length, 


Eq. (219) = 
; 0. 
sal Ei ected gd alte 
b 4 2 b 


88 As follows from Eq. (78), which may be used for crude estimates even in cases of arbitrary wave incidence, this 
condition is necessary for low attenuation: a << k only if /<< 1. 


H,|. (7.224) 


hos Re ted = [ai 2h 
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Note that at a << b, the losses in the inner conductor dominate, despite its smaller surface, because of 
the higher surface field. 


Now we may plug Eqs. (223) and (224) into the definition (214) of @, to calculate the skin-effect 
contribution to the attenuation constant: 


= AR | dz _ 1 (4 4 1 eee z ko, (J % "| (7.225) 
SP 2In(b/a)\a b) Z 2In(b/a)\a_ b 


This result shows that the relative (dimensionless) attenuation, a@/k, scales approximately as the ratio 
6,/min[a, b], in a semi-quantitative agreement with the plane-wave result (78). 


Let us use this result to evaluate @ for the standard TV cable RG-6/U, with copper conductors of 
diameters 2a = 1 mm, 2b = 4.7 mm, and ¢~ 2.2 and ~ . According to Eq. (6.33), for f= 100 MHz 
(i.e. a 6.3x10* s") the skin depth of pure copper at room temperature (with o ~ 6.0x10’ S/m) is close 
to 6.5x10° m, while k= a(eu)'? = (e/@)''""(a/c) = 3.1m’. As a result, the attenuation is rather low: kin 
= 0.016 m’, so that the attenuation length scale Jy = 1/a@ is about 60 m. Hence the attenuation in a cable 
connecting a roof TV antenna or a cable distribution box to a TV set is not a big problem, though using 
a worse conductor, e.g., steel, would make the losses rather noticeable. (Hence the current worldwide 
shortage of copper.) However, the use of such cable in the X-band (f~ 10 GHz) is more problematic. 
Indeed, though the skin depth & «< w'” decreases with frequency, the wavelength drops, i.e. k 
increases, even faster (A oc @), so that the attenuation Akin & «'” becomes close to 0.16 m |, ie. Jy to ~6 
m. This is why at such frequencies, it may be necessary to use rectangular waveguides, with their larger 
internal dimensions a, b ~ 1/k, and hence lower attenuation. Let me leave the calculation of this 
attenuation, using Eq. (219) and the results derived in Sec. 7, for the reader’s exercise. 


The main effect of dissipation on free oscillations in resonators is different: here it leads to a 
gradual decay of the oscillating fields’ energy U in time. A useful dimensionless measure of this decay, 
called the QO factor, is commonly defined by writing the following temporal analog of Eq. (213):89 


-dU=k 


loss 


dt = uat, (7.226) 
Q 
where @ in the resonance frequency in the loss-free limit, and 


(7.207) 


The solution of Eq. (226), 


- 2 7 
HOa0Oe" wihge@ ol 22.27 (7.228) 
o o/2ln 224 
which is the temporal analog of Eq. (215), shows the physical meaning of the Q-factor: the characteristic 
time z of the oscillation energy’s decay is (Q/27) times longer than the oscillation period 7= 27/a. 
(Another useful interpretation of O comes from the universal relation”? 


89 As losses grow, the oscillation waveform deviates from the sinusoidal one, and the very notion of “oscillation 
frequency” becomes vague. As a result, the parameter Q is well-defined only if it is much higher than 1. 
99 See, e.g., CM Sec. 5.1. 
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j=. (7.229) 
Ao 

where Aw is the so-called FWHM °! bandwidth of the resonance, namely the difference between the two 

values of the external signal frequency, one above and one below @, at which the energy of the 

oscillations induced in the resonator by an input signal is twice lower than its resonance value.) 


In the important particular case of a resonant cavity formed by the insertion of metallic walls into 
a TEM transmission line of a small cross-section (with the linear size scale a much less than the 
wavelength 4), there is no need to calculate the Q-factor directly, provided that the line attenuation 
coefficient @ is already known. In fact, as was discussed in Sec. 8 above, the standing waves in such a 
resonator, of the length given by Eq. (196): / = p(A/2) with p = 1, 2,..., may be understood as an overlap 
of two TEM waves propagating in opposite directions, or in other words, a traveling wave plus its 
reflection from one of the ends, the whole roundtrip taking time At = 2//v = pd/v = 2mp/o = pT 
According to Eq. (215), at this distance, the wave’s power drops by a factor of exp{-2a/} = exp {-pad}. 
On the other hand, the same decay may be viewed as taking place in time, and according to Eq. (228), 
results in the drop by a factor of exp {-Ad/t} = exp{-(p7)/(Q/o@)} = exp{-27p/Q}. Comparing these two 
exponents, we get 


io (7.230) 
This simple relation neglects the losses at the wave reflection from the walls limiting the 
resonator length. This approximation is indeed legitimate at a << A; if this relation is violated, or if we 
are dealing with more complex resonator modes (such as those based on the reflection of EF or H waves), 
the Q-factor may be different from that given by Eq. (230), and needs to be calculated directly from Eq. 
(227). A substantial relief for such a direct calculation is that, just at the calculation of small attenuation 
in waveguides, in the low-loss limit (O >> 1), both the numerator and denominator of the right-hand 
side of that formula may be calculated neglecting the effects of the power loss on the field distribution in 
the resonator. I am leaving such a calculation, for the simplest (rectangular and circular) resonant 
cavities, for the reader’s exercise. 


To conclude this chapter, the last remark: in some distributed resonators (including certain 
dielectric resonators and metallic cavities with holes in their walls), additional losses due to the wave 
radiation into the environment are also possible. In some simple cases (say, the Fabry-Pérot 
interferometer shown in Fig. 31) the calculation of these radiative losses is straightforward, but 
sometimes it requires more elaborated approaches that will be discussed in the next chapter. 


7.10. Exercise problems 


7.1.” Find the temporal Green’s function of a medium whose complex dielectric constant obeys 
the Lorentz oscillator model given by Eq. (32), using: 


(1) the Fourier transform, and 
(11) the direct solution of Eq. (30). 


Hint: For the Fourier-transform approach, you may like to use the Cauchy integral.% 


91 FWHM is the acronym for Full Width at Half-Maximum. 
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7.2. The electric polarization of some material responds in the following way to an electric field 
step:?3 


7 tit 7 0, for t<0, 
P(t)=6,E,ll e ) if £(0)= Bx}! oy 


where rt > 0 and & are some constants. Calculate the complex permittivity «(@) of this material, and 
discuss a possible simple physical model giving such dielectric response. 


7.3. Calculate the complex permittivity «&@) of a material whose dielectric-response Green’s 
function defined by Eq. (23), is 


G(0)=G,{1-e9""), 


with some positive constants Go and zc. What is the difference between this dielectric response and the 
apparently similar one considered in the previous problem? 


7.4. Use the Lorentz oscillator model of an atom, given by Eq. (30), to calculate the average 
potential energy of the atom in a uniform, sinusoidal ac electric field, and use the result to calculate the 
potential profile created for the atom by a standing electromagnetic wave with the electric field 
amplitude Fr). 


7.5. The solution of the previous problem shows that a standing, plane electromagnetic wave 
exerts a time-averaged force on an otherwise free non-relativistic charged particle. Reveal the physics of 
this force by writing and solving the equations of motion of the particle in: 


(1) a linearly-polarized, monochromatic, plane traveling wave, and 
(ii) a similar but standing wave. 


7.6. Use the first of Eqs. (54) to relate the integral 
[e"(Q)ada 


0 
to the plasma frequency for the Lorentz oscillator model of a system of non-interacting atoms. 


7.7. Calculate, sketch, and discuss the dispersion relation for electromagnetic waves propagating 
in a medium described by Eq. (32), for the case of negligible damping. 


7.8. As was briefly discussed in Sec. 2,94 a wave pulse of a finite but relatively large spatial 
extension Az >> A = 2”/k may be formed as a wave packet — a sum of sinusoidal waves with wave 
vectors k within a relatively narrow interval. Consider an electromagnetic plane wave packet of this 
type, with the electric field distribution 


K(r,t) = Re fe - On) ae with o, [e(o, \ulo, yt? = |x 


—oO 


> 


92 See, e.g., MA Eq. (15.2). 

°3 This function (ft) is of course proportional to the well-known step function At) — see, e.g., MA Eq. (14.3). I 
am not using this notion just to avoid possible confusion between two different uses of the Greek letter 6. 

°4 For even more detail, see CM Sec. 5.3 and especially QM Sec. 2.2. 
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propagating along the z-axis in an isotropic, linear, and loss-free (but not necessarily dispersion-free) 
medium. Express the full energy of the packet (per unit area of the wave’s front) via the complex 
amplitudes E;, and discuss its dependence on time. 


7.9. Prove the Lorentz reciprocity relation expressed by Eq. (6.121), for a linear isotropic 


medium. 


7.10. A plane wave of frequency @ is normally incident, from free space, on the surface of a 
collision-free plasma with the electron density growing linearly with the distance from the surface: n = 
vz for z = 0, where y> 0 is a constant. Calculate the functional form of the resulting standing wave’s 
“tail” deep inside the plasma, i.e. at z > «. 


7.11." Analyze the effect of a time-independent uniform magnetic field Bo, parallel to the 
direction n of electromagnetic wave propagation, on the wave’s dispersion in plasma, within the same 
simple model that was used in Sec. 2 for the derivation of Eq. (38). (Limit your analysis to relatively 


weak waves, whose magnetic field is negligible in comparison with Bo.) 


Hint: You may like to represent the incident wave as a linear superposition of two circularly 
polarized waves, with opposite polarization directions. 


7.12. A monochromatic plane electromagnetic wave is normally incident, from free space, on a 
uniform slab with electric permittivity ¢ and magnetic permeability w, with the slab’s thickness d 
comparable with the wavelength. 


(i) Calculate the power transmission coefficient 7, i.e. the fraction of the incident wave’s power, 
that is transmitted through the slab. 
(ii) Assuming that ¢ and w are frequency-independent and positive, analyze in detail the 


frequency dependence of 7 In particular, how does the function 7(@) depend on the slab’s thickness d 


1/2 


and the wave impedance Z = (/¢é)~ of its material? 


7.13. A plane electromagnetic wave, with free-space wave number ho, is normally incident on a 
plane, conducting film of thickness d ~ 6, << 1/ko. Calculate the power transmission coefficient of the 
system, i.e. the fraction of the incident wave’s power propagating beyond the film. Analyze the result in 
the limits of small and large ratio d/6,. 


7.14. A plane wave of frequency @ is normally incident, from o> Mo é, 
free space, on a plane surface of a material with real electric 
permittivity ¢’ and magnetic permeability wz’. To minimize the wave’s 
reflection from the surface, it may be covered with a layer, of thickness 
d, of another transparent material — see the figure on the right. Calculate 
the optimal values of ¢ yw, and d. 


7.15. A monochromatic plane wave is incident from inside a medium with eu > es onto its 
plane surface, at an angle of incidence @ larger than the critical angle @ = sin '(Epi/ El). Calculate 
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the depth 6 of the evanescent wave penetration into the free space, and analyze its dependence on @. 
Does the result depend on the wave’s polarization? 


7.16. Analyze the possibility of propagation of surface electromagnetic waves along a plane 
boundary between plasma and free space. In particular, calculate and analyze the dispersion relation of 
the waves. 


Hint: Assume that the magnetic field of the wave is parallel to the boundary and perpendicular to 
the wave’s propagation direction. (After solving the problem, justify this mode choice.) 


7.17. Light from a very distant source arrives to an observer through a 1 OF 
plane layer of nonuniform medium with a certain 1D gradient of its refraction wO 
index n(z), at angle & — see the figure on the right. What is the genuine direction G9, 
@ to the source, if n(z) > | at z > «? (This problem is evidently important for n(z) A 


high-precision astronomical measurements from the Earth’s surface.) 


7.18. Calculate the impedance Zw of the long, straight TEM transmission lines formed by 
metallic electrodes with the cross-sections shown in the figure below: 


(i) (ii) (iii) 
2R> O© eWieres ee 
KW * HH 
d>>R 4d d, M 


2RYO Wed) o 


(i) two round, parallel wires separated by distance d >> R, 

(11) a microstrip line of width w >> d, 

(iii) a stripline with w >> d, ~ do, 
in all cases using the coarse-grain boundary conditions on the metallic surfaces. Assume that the 
conductors are embedded into a linear dielectric with constant ¢ and yz. 


7.19. Modify the solution of Task (11) of the previous problem for a superconductor microstrip 
line, taking into account the magnetic field’s penetration into both the strip and the ground plane. 


7.20. What lumped ac circuit would be equivalent to the TEM-line system shown in Fig. 19, 
with an incident wave’s power “? Assume that the wave reflected from the lumped load circuit does 
not return to it. 


7.21. Find the lumped ac circuit equivalent to a loss-free 
TEM transmission line of length / ~ A, with a small cross-section a‘ <<J 
area A << 4°, as “seen” (measured) from one end, if the line’s 
conductors are galvanically connected (“shortened’’) at the other end 
— see the figure on the right. Discuss the result’s dependence on the 
signal frequency. 


| 
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7.22. Represent the fundamental Hijo wave in a rectangular waveguide (Fig. 22) with a sum of 
two plane waves, and discuss the physics behind such a representation. 


7.23. For the metallic coaxial cable (Fig. 20), find the lowest non-TEM mode and calculate its 
cutoff frequency. 


7.24. Two coaxial cable sections are connected coaxially — 
see the figure on the right, which shows the system’s cut along its 
symmetry axis. Relations (118) and (120) seem to imply that if the 
ratios b/a of these sections are equal, their impedance matching is 
perfect, ic. a TEM wave incident from one side on the connection 
would pass it without any reflection at all: R = 0. Is this statement 
correct? 


7.25. Calculate the cutoff frequencies @, of the fundamental mode 
and the next lowest mode in the so-called ridge waveguide with the cross- 
section shown in the figure on the right, in the limit t << a, b, w. Briefly b ‘\ 
discuss possible advantages and drawbacks of such waveguides for signal 
transfer and physical experiment. 
7.26. Prove that TEM-like waves may propagate, in the radial direction, in the Ne 
free space between two coaxial, round, metallic cones — see the figure on the right. Can 
this system be characterized by a certain transmission line impedance Zy, as defined by 
Eq. (115)? 


7.27. Use the recipe outlined in Sec. 7 to prove the characteristic equation (161) for the HE and 
EH waves in step-index optical fiber with a round cross-section. 


7.28. Derive an approximate equation describing spatial variations of the complex amplitude of a 
general monochromatic paraxial beam propagating in a uniform medium, for the case when these 
variations are sufficiently slow. Is the Gaussian beam described by Eq. (181) one of possible solutions 
of this equation? Give your interpretation of the last result. 

Z 


7.29. Neglecting the skin-effect depth 6, in comparison with / and R, find 


the lowest resonance frequencies, and the corresponding field distributions, of i 


standing electromagnetic waves inside a round cylindrical cavity with metallic 
walls — see the figure on the right. 


R 


7.30. Analyze long-wave electromagnetic waves that may propagate inside a relatively narrow 
gap between two well-conducting concentric shells of radii R and R + d, in the limit d << R. 


(1) Within the coarse-grain approximation, derive the 2D equation describing such waves with 
relatively large wavelengths 2 ~ R >> d. 
(ii) Calculate the lowest resonance frequencies of the system. 


Hint: The last task requires some familiarity with the basic properties of spherical harmonics. 
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7.31. A plane monochromatic wave propagates through a medium with an Ohmic conductivity o, 
and negligible electric and magnetic polarization effects. Calculate the wave’s attenuation, and relate the 
result with a certain calculation carried out in Chapter 6. 


7.32. Generalize the telegrapher’s equations (110)-(111) by accounting for small energy losses: 


(1) in the transmission line’s conductors, and 
(11) in the medium separating the conductors, 


using their simplest (Ohmic) models. Formulate the conditions of validity of the resulting equations. 


7.33. Calculate the skin-effect contribution to the attenuation constant a of a TEM wave in the 
microstrip line discussed in Problem 18 (ii). 


7.34. Calculate the skin-effect contribution to the attenuation coefficient a defined by Eq. (214), 
for the fundamental (Hi9) mode propagating in a metallic-wall waveguide with a rectangular cross- 
section — see Fig.22. Use the results to evaluate the wave decay length /g= 1/a@ ofa 10 GHz wave in the 
standard X-band waveguide WR-90 (with copper walls, a = 23 mm, b = 10 mm, and no dielectric 
filling), at room temperature. Compare the result with that (obtained in Sec. 9) for the standard TV 
coaxial cable, at the same frequency. 


7.35. Calculate the skin-effect contribution to the attenuation coefficient a of 


(i) the fundamental (H;) wave, and 

(ii) the Ho, wave, 
in a metallic-wall waveguide with the circular cross-section (see Fig. 23a), and analyze the low- 
frequency (@ >@,) and high-frequency (@>> @,) behaviors of @ for each of these modes. 


7.36. For a rectangular metallic-wall resonant cavity with dimensions axbx/ (6 < a, /), calculate 
the Q-factor of the fundamental oscillation mode, due to the skin-effect losses in the walls. Evaluate the 
factor for a 23x23x10 mm’ cavity with copper walls, at room temperature. 


7.37. Calculate the lowest eigenfrequency and the Q-factor (due to r 
the skin-effect losses) of the toroidal (axially-symmetric) resonant cavity J 
with metallic walls, and the interior’s cross-section shown in the figure on *0 


the right, in the case when d <<r, R. 


7.38. Express the contribution to the damping coefficient (the reciprocal Q-factor) of a resonant 
cavity, by small energy losses in the dielectric that fills it, via the complex functions «&(@) and 4(@) of 
the material. 


7.39. For the dielectric Fabry-Pérot resonator (Fig. 31) with the normal wave incidence, calculate 
the Q-factor due to radiation losses, in the limit of a strong impedance mismatch (Z >> Zo), using two 
approaches: 


(i) from the energy balance, using Eq. (227), and 
(ii) from the frequency dependence of the power transmission coefficient, using Eq. (229). 


Compare the results. 


Chapter 7 Page 68 of 68 


Essential Graduate Physics EM: Classical Electrodynamics 


Chapter 8. Radiation, Scattering, Interference, and Diffraction 


This chapter continues the discussion of electromagnetic wave propagation, now focusing on the results 
of wave incidence on various objects of more complex shapes. Depending on the shape, the resulting 
wave pattern is called either “scattering”, or “diffraction”, or “interference”. However, as the reader 
will see, the boundaries between these effects may be blurry, and their basic mathematical description 
may be conveniently based on the same key calculation — the electric-dipole radiation of a spherical 
wave by a localized source. Naturally, I will start the chapter from this calculation, deriving it from an 
even more general result — the “retarded-potential” solution of the Maxwell equations. 


8.1. Retarded potentials 


Let us start by finding the general solution of the macroscopic Maxwell equations (6.99) in a 
dispersion-free, linear, uniform, isotropic medium characterized by frequency-independent real ¢ and yu! 
The easiest way to perform this calculation is to use the scalar (¢) and vector (A) potentials defined by 
Eqs. (6.7): 


E=-v6-S, B=VxA. (8.1) 
t 
As was discussed in Sec. 6.8, by imposing upon the potentials the Lorenz gauge condition (6.117), 
yon no, with v7 iF (8.2) 
v Ot ELL 


which does not affect the fields E and B, the Maxwell equations are reduced to a pair of very similar, 
simple equations (6.118) for the potentials: 


10°¢_ p 
Vv’ =f. 8.3a 
é vy? Of é S28) 
10°A : 
vi ee (8.3b) 


Let us find the general solution of these equations, for now thinking of the densities p(r, ¢) and 
j(r, 4) of the stand-alone charges and currents as known functions. (This will not prevent the results from 
being valid for the cases when p(r, f) and j(r, 7) should be calculated self-consistently.) The idea of such 
a solution may be borrowed from electro- and magnetostatics. Indeed, for the stationary case (0/Ot = 0), 
the solutions of Eqs. (8.3) are given by a ready generalization of, respectively, Eqs. (1.38) and (5.28) to 
a uniform, linear medium: 


1 ; d*r' 
wn) = 7 — | ee , (8.4a) 
_ Ul ve d’r' 
sre reer (8.4b) 


! When necessary (e.g., at the discussion of the Cherenkov radiation in Sec. 10.5), it will be not too hard to 
generalize these results to a dispersive medium. 
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As we know, these expressions may be derived by, first, calculating the potential of a point source, and 
then using the linear superposition principle for a system of such sources. 


Let us do the same for the time-dependent case, starting from the field induced by a time- 
dependent point charge at the origin:? 


Plr,t) = q(Ho(r), (8.5) 
In this case, Eq. (3a) is homogeneous everywhere but the origin: 
1 o¢ 
V’*¢-——~=0, forr#0. 8.6 
v’ Ot? ey 


Due to the spherical symmetry of the problem, it is natural to look for a spherically-symmetric solution 
to this equation.3 Thus, we may simplify the Laplace operator correspondingly (as was repeatedly done 
earlier in this course), so that Eq. (6) becomes 


2{- 2) LE |o=0, for r #0. (8.7) 
By introducing a new variable y(7, t) = rar, t), Eq. (7) is reduced to the 1D wave equation 
[S-5S)e-0 for r#0. (8.8) 
From discussions in Chapter 7,4 we know that its general solution may be represented as 
H0)= toa t-£) + 20147], (8.9) 


where 7in and Your are (so far) arbitrary functions of one variable. The physical sense of @our= You/7 is a 
spherical wave propagating from our source (located at r = 0) to outer space, i.e. exactly the solution we 
are looking for. On the other hand, ¢in = 7in/r describes a spherical wave that could be created by some 
distant spherically-symmetric source, that converged exactly on our charge located at the origin — 
evidently not the effect we want to consider here. Discarding this term, and returning to d= y/r, we get 


#0.) 2 a(t}, for r #0. (8.10) 
r Vv 


In order to calculate the function Your, let us consider the solution (10) at distances 7 so small (r 
<< yt) that the time-derivative term in Eq. (3a), with the right-hand side (5), 


Ef eee OTA (8.11) 
v° Ot é 


2 Admittedly, this expression does not satisfy the continuity equation (4.5), but this deficiency will be corrected 
imminently, at the linear superposition stage — see Eq. (17) below. 

3 Let me emphasize that this is not the general solution to Eq. (6). For example, it does not describe the possible 
waves created by other sources, that pass by the considered charge q(t). However, such fields are irrelevant for 
our current task: to calculate the field induced by the charge q(t). The solution becomes general when it is 
integrated (as it will be) over all relevant charges. 

4 See also CM Sec. 6.3. 
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is much smaller than the spatial derivative term (which diverges at r — 0) . Then Eq. (11) is reduced to 
the Poisson equation, whose solution (4a), for the source (5), is 


wr > 0,1) =O. (8.12) 


Now requiring the two solutions, Eqs. (10) and (12), to coincide at r << vt, we get You(t) = q(0)/4 er, so 
that Eq. (10) becomes 


v 


P(r,t) = a {t-4) (8.13) 
Aner 


Just as was repeatedly done in statics, this result may be readily generalized for the arbitrary 
position r’ of the point charge: 


plr,t) = q(t)d(r-¥') = q(Hd(R), (8.14) 


where R is the distance between the field observation point r and the source position point r’, i.e. the 
length of the vector, 
Rer-r’, (8.15) 


connecting these points — see Fig. 1. 


Fig. 8.1. Calculating the retarded 
potentials of a localized source. 


Obviously, now Eq. (13) becomes 


1 R 


Finally, we may use the linear superposition principle to write, for the arbitrary charge distribution, 


(8.17a) 


where the integration is extended over all charges of the system under analysis. Solving Eq. (4b) 
absolutely similarly, for the vector potential we get> 


(8.17b) 


5 As should be clear from the analogy of Eqs. (17) with their stationary forms (4), which were discussed, 
respectively, in Chapters | and 5, in the Gaussian units the retarded potential formulas are valid with the 
coefficient 1/47 dropped in Eq. (17a), and replaced with 1/c in Eq. (17b). 
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(Now nothing prevents the functions p(r, ft) and j(r, ¢) from satisfying the continuity equation.) 


The solutions expressed by Eqs. (17) are traditionally called the retarded potentials, the name 
signifying the fact that the observed fields are “retarded” (in the meaning “delayed’’) in time by At = R/v 
relative to the source variations — physically, because of the finite speed v of the electromagnetic wave 
propagation. Note that, very remarkably, these simple expressions are exact solutions of the 
macroscopic Maxwell equations (again, in a uniform, linear, dispersion-free medium) for an arbitrary 
distribution of stand-alone charges and currents. They also may be considered as the general solutions 
of these equations, provided that the integration has been extended over all field sources in the Universe 
— or at least over those ones that affect our observations. 


Note also that due to the mathematical similarity of the microscopic and macroscopic Maxwell 
equations, Eqs. (17) are valid, with the coefficient replacement ¢ > & and 44 > su, for the exact, rather 
than the macroscopic fields, provided that the functions p(r, ¢) and j(r, 4) describe not only stand-alone 
but all charges and currents in the system. (Alternatively, this statement may be formulated as the 
validity of Eqs. (17), with the same coefficient replacement, in free space.) 


Finally, note that Eqs. (17) may be plugged into Eqs. (1), giving (after an explicit differentiation) 
the so-called Jefimenko equations® for fields E and B — similar in structure to Eqs. (17), but more 
cumbersome. Conceptually, the existence of such equations is good news, because they are free from the 
gauge ambiguity pertinent to the potentials ¢ and A. However, the practical value of these explicit 
expressions for the fields is not overly high: for all applications I am aware of, it is easier to use Eqs. 
(17) to calculate the particular expressions for the potentials first, and only then calculate the fields from 
Eqs. (1). Let me now present an (arguably, the most important) example of this approach. 


8.2. Electric dipole radiation 


Consider again the problem that was discussed in electrostatics (Sec. 3.1), namely the field of a 
localized source with linear dimensions a << r (see Fig. 1 again), but now with time-dependent charge 
and/or current distributions. Using all the arguments of that discussion, in particular the condition 
expressed by Eq. (3.1), r’ << r, we may apply the Taylor expansion (3.3), truncated to two leading 
terms, 


f(R) = f(t) -1'- V(r) +... (8.18) 


to the scalar function f(R) = R (for which Vf(r) = VR = n, where n = r/r is the unit vector directed 
toward the observation point — see Fig. 1) to approximate the distance R as 


Rer-r'-n. (8.19) 


In each of the retarded potential formulas (17), R participates in two places: in the denominator 
and in the source’s time argument. If o and j change in time on the scale ~1/@, where @ is some 
characteristic frequency, then any change of the argument (¢ — R/v) on that time scale, for example due 
to a change of R on the spatial scale ~v/@ = 1/k, may substantially change these functions. Thus, the 
expansion (19) may be applied to R in the argument (¢— R/v) only if ka << 1, 1.e. if the system’s size a 


6 They were published by O. D. Jefimenko only in 1966, but the Fourier representation of the same result was 
obtained much earlier (in 1912) by G. A. Scott. 
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is much smaller than the radiation wavelength A = 27/k. On the other hand, the function 1/R changes 
relatively slowly, and for it even the first term of the expansion (19) gives a good approximation as soon 
as a<<r, R. In the latter approximation alone, Eq. (17a) yields 


ire Jol a \ar's : of: a (8.20) 
Vv 


Aner Aner v 


where Q(f) is the net electric charge of the localized system. Due to the charge conservation, this charge 
cannot change with time, so that the approximation (20) describes just a static Coulomb field of our 
localized source, rather than a radiated wave. 


Let us, however, apply a similar approximation to the vector potential (17b): 
A(r,f) = freA arr, (8.21) 
4nr Vv 
According to Eq. (5.87), the right-hand side of this expression vanishes in statics, but in dynamics, this 


is no longer true. For example, if the current is due to some non-relativistic motion’ of a system of point 
charges g;, we can write 


fiend’ = Dak. =S Dan 0= a0. (8.22) 


where p() is the dipole moment of the localized system, defined by Eq. (3.6). Now, after the integration, 
we may keep only the first term of the approximation (19) in the argument (t¢ — R/v) as well, getting 


A(r,t) © Holt = “| fora << Ro. (8.23) 


Let us analyze what exactly this result describes. The second of Eqs. (1) allows us to calculate 
the magnetic field by the spatial differentiation of A. At large distances r >> / (i.e. in the so-called far- 
field zone), where Eq. (23) describes a locally-plane wave, the dominating contribution to this derivative 
is given by the dipole moment factor: 


(8.24) 


This expression means that the magnetic field, at the observation point, is perpendicular to the vectors n 
and (the retarded value of) p, and its magnitude is 


B= afr-“)sine, eo afr-“)sino, (8.25) 


A4arv y A4arv y 


where © is the angle between those two vectors — see Fig. 2.8 


7 For relativistic particles, moving with velocities of the order of speed of light, one has to be more careful. As the 
result, I will postpone the discussion of their radiation until Chapter 10, i.e. until after the detailed discussion of 
special relativity in Chapter 9. 

8 From the first of Eqs. (1) for the electric field, in the first approximation (23), we would get -OA/0t = -(1/4zevr) 
p(t —1r/v) = -(ZAmr) p (t — r/v). The transverse component of this vector (see Fig. 2) is the proper electric field E 
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Fig. 8.2. Far-fields of a localized source, 
contributing to its electric dipole radiation. 


The most important feature of this result is that the time-dependent field decreases very slowly 
(only as 1/r) with the distance from the source, so that the radial component of the corresponding 
Poynting vector (7.9b),? 


Instant 


(8.26) Power 


density 


a 
p°2n [sin ©do = — Bp (S27) gerne 
0 


~ (4nvy’ 


does not depend on the distance from the source — as it should for radiation. !° 


This is the famous Larmor formula'' for the electric dipole radiation; it is the dominating 
component of radiation by a localized system of charges — unless p = 0. Please notice its angular 
dependence: the radiation vanishes at the axis of the retarded vector p (where © = 0), and reaches its 
maximum in the plane normal to that axis. 


In order to find the average power, Eq. (27) has to be averaged over a sufficiently long time. In 
particular, if the source is monochromatic, p(t) = Re[pjexp{—iat}], with a time-independent vector 
amplitude p,, such averaging may be carried out just over one period, giving an extra factor 2 in the 


denominator: 
Average 


(8.28) radiation 


power 


The easiest application of this formula is to a point charge oscillating, with frequency @, along a 
straight line (which we may take for the z-axis), with amplitude a. In this case, p = gz(t)n. = ga 
Re[exp {-iaf} |n., and if the charge velocity amplitude, aa, is much less than the electromagnetic wave’s 
speed v, we may use Eq. (28) with p= qa, giving 


= ZHxn of the radiated wave, while its longitudinal component is exactly compensated by (—V@) in the next term 
of the Taylor expansion of Eq. (17a) in small parameter ka ~ a/A << 1. 
° Note the “doughnut” dependence of S, on the direction n, frequently used to visualize the dipole radiation. 


10 In the Gaussian units, for free space (v = c), Eq. (27) reads P = (2/3c*)p’. 


'l Named after Joseph Larmor, who was the first to derive this formula (in 1897) for the particular case of a single 
point charge g moving with acceleration Fr, when p = qf. 
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= Zq’aa* 


Des (8.29) 


Applied to a classical picture of an electron (with g = —e ~ 1.6x10'°C), initially rotating about an 
atom’s nucleus at an atomic distance a ~ 10°'° m, Eq. (29) shows'!? that the energy loss due to the dipole 
radiation is so large that it would cause the electron to collapse on the nucleus in just ~10"'! s. In the 
beginning of the 1900s, this result was one of the main arguments for the development of quantum 
mechanics, which prevents such a collapse of electrons for their lowest-energy (ground) quantum state. 


Another useful application of Eq. (28) is the radio wave radiation by a short, straight, symmetric 
antenna which is fed, for example, by a TEM transmission line such as a coaxial cable — see Fig. 3. 


Z 
+1/2 
in > 0 SP 
<+ ea 
Fig. 8.3. The dipole antenna. 
—1/2 


The exact solution of this problem is rather complicated because the law [,(z) of the current 
variation along the antenna’s length should be calculated self-consistently with the distribution of the 
electromagnetic field induced by the current in the surrounding space. (Unfortunately, this fact is not 
mentioned in some textbooks.) However, the current should be largest in the feeding point (in Fig. 3, 
taken for z = 0) and vanish at antenna’s ends (z = +//2), and hence we may guess that at / << A, the linear 
function 


[,(z)= 14(0{ 1-2/4), (8.30) 
should be a good approximation of the actual distribution — as it indeed is. Now we can use the 
continuity equation 0Q/ot = I, 1.e. -i@Q» = I, to calculate the complex amplitude Q,(z) = il{z)sgn(z)/@ 
of the electric charge Q(z, t) = Re[Q.exp {—iat}] of the wire’s segment [0, z], and from it, the amplitude 
of the charge’s linear density: 


= 10.(2)__, 20). 
d|z| al 


From here, the dipole moment’s amplitude is 


4,(2) 


nz. (8.31) 


1/2 
Do =2(A,(z)zdz = _jfo), (8.32) 
F 20 


so that Eq. (28) yields 


!2 Actually, the formula needs a numerical coefficient adjustment to account for the electron’s orbital (rather than 
linear) motion — the task left for the reader’s exercise. However, this adjustment does not affect the order-of- 
magnitude estimate given above. 
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— o 1,00). Zk)? 1,0] 
l2z7v- 4a 247 2 


(8.33) 


where k = w/v. The analogy between this result and the dissipation power Y = ReZ | I,,|°/2 in a lumped 
linear circuit element, enables the interpretation of the first fraction in the last form of Eq. (33) as the 
real part of the antenna’s impedance: 
2 
ReZ, =Z i) : (8.34) 
247 


as felt by the transmission line. 


According to Eq. (7.118), the wave traveling along the line toward the antenna is fully radiated, 
i.e. not reflected back, only if Z4 equals to Zw of the line. As we know from Sec. 7.5 (and the solution of 
the related problems), for typical TEM lines, Zw ~ Zo, while Eq. (34), which is only valid in the limit / 
<< 1, shows that for the radiation into free space (Z = Zo), ReZ, is much less than Zo. Hence to reach the 
impedance matching condition Zw = Z4, the antenna’s length should be increased — as a more involved 
theory shows, to / ~ A/2. However, in many cases, practical considerations make short antennas 
necessary. The example most often met nowadays is the cell phone antennas, which use frequencies 
close to 1 or 2 GHz, with free-space wavelengths 4 between 15 and 30 cm, i.e. much larger than the 
phone size.!3 The quadratic dependence of the antenna’s efficiency on /, following from Eq. (34), 
explains why every millimeter counts in the design of such antennas, and why the designs are carefully 
optimized using software packages for the (virtually exact) numerical solution of the Maxwell equations 
for the specific shape of the antenna and other phone parts. '4 


To conclude this section, let me note that if the wave source is not monochromatic, so that p(f) 
should be represented as a Fourier series, 


p(t)=Re> pe, (8.35) 


the terms corresponding to the interference of spectral components with different frequencies @ are 
averaged out at the time averaging of the Poynting vector, and the average radiated power is just a sum 
of contributions (28) from all substantial frequency components. 


8.3. Wave scattering 


The Larmor formula may be used as the basis of the theory of scattering — the phenomenon 
illustrated by Fig. 4. Generally, scattering is a complex problem. However, in many cases it allows the 
so-called Born approximation,'> in which the scattered wave field’s effect on the scattering object is 
assumed to be much weaker than that of the incident wave, and is neglected. 


13 The situation will be partly remedied by the transfer of the wireless mobile technology to its next generations, 
with the frequencies moved up to the 28 GHz, 37-39 GHz, and possibly even the 64-71 GHz bands. 

14 A partial list of popular software packages of this kind includes both publicly available codes such as Nec2 
(whose various versions are available online, e.g., at http://www.gsl.net/4nec2/), and proprietary packages — such 
as Momentum from Agilent Technologies (now owned by Hewlett-Packard), FEKO from EM Software & 
Systems, and XF did from Remcom. 

15 Named after Max Born, one of the founding fathers of quantum mechanics. However, the basic idea of this 
approach to EM waves was developed much earlier (in 1881) by Lord Rayleigh — born John William Strutt. 
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incident scattered 
wave wave 


scattering 


object Fig. 8.4. Wave scattering (schematically). 


As the first example of this approach, let us consider the scattering of a plane wave, propagating 
in free space (Z = Zo, v = c), by an otherwise free!® charged particle whose motion may be described by 
non-relativistic classical mechanics. (This requires, in particular, the incident wave to be not too 
powerful, so that the speed of the charge’s motion induced by the wave remains much lower than c.) As 
was already discussed at the derivation of Eq. (7.32), in this case, the magnetic component of the 
Lorentz force (5.10) is negligible in comparison with the force F. = gE exerted by the wave’s electric 
field. Thus, assuming that the incident wave is linearly polarized along some axis x, the equation of 
particle’s motion in the Born approximation is just mx = qE(#), so that for the x-component p, = qx of its 
dipole moment we can write 


2 
p=gk=L EO. (8.36) 
m 
As we already know from Sec. 2, oscillations of the dipole moment lead to radiation of a wave with a 


wide angular distribution of intensity; in our case, this is the scattered wave — see Fig. 4. Its full power 
may be found by plugging Eq. (36) into Eq. (27): 


Zo nd Ag 


Te a E’(t), (8.37) 
so that for the average power we get 
= 2q" 2 
Se ale oy 


Since this power is proportional to the incident wave’s intensity S, it is customary to characterize 
the scattering ability of an object by the ratio, 


(8.39) 


2 ’ 
incident E. | / 224 


which has the dimensionality of area, and is called the total cross-section of scattering.!’ For this 
measure, Eq. (38) yields the famous result 


16 As Eq. (7.30) shows, this calculation is also valid for an oscillator with a low own frequency, @ << @. 
!7 This definition parallels those accepted in the classical and quantum theories of particle scattering — see, ¢.g., 
respectively, CM Sec. 3.5 and QM Sec. 3.3. 
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(8.40) 


which is called the Thomson scattering formula,'® especially when applied to an electron. This relation 
is most frequently represented in the form!? 


(8.41) 


This constant r, is called the classical radius of the particle (or sometimes the “Thomson scattering 
length”); for the electron (g = -e, m = m,) it is close to 2.82x10"° m. Its possible interpretation is 
evident from Eq. (41) for 7.: at that distance between two similar particles, the potential energy gq’ /4meor 
of their electrostatic interaction is equal to the particle’s rest-mass energy mc’.2° 


Now we have to go back and establish the conditions at which the Born approximation, when the 
field of the scattered wave is negligible, is indeed valid for a point-object scattering. Since the scattered 
wave’s intensity described by Eq. (26) diverges at r > 0 as 1/r’, according to the definition (39) of the 
cross-section, it may become comparable to Sincident at ** ~ o. However, Eq. (38) itself is only valid if r 
>> A, so that the Born approximation does not lead to a contradiction only if 


o<<’. (8.42) 


For the Thompson scattering by an electron, this condition means 2 >>, ~ 3x10 m and is fulfilled for 
all frequencies up to very hard ~rays with photon energies ~100 MeV. 


Possibly the most notable feature of the result (40) is its independence of the wave frequency. As 
it follows from its derivation, particularly from Eq. (37), this independence is intimately related to the 
unbound character of charge motion. For bound charges, say for electrons in gas molecules, this result is 
only valid if the wave frequency @ is much higher than the frequencies @ of most important quantum 
transitions. In the opposite limit, @ << q@, the result is dramatically different. Indeed, in this limit we 
may approximate the molecule’s dipole moment by its static value (3.48): 


p=ak. (8.43) 


In the Born approximation, and in the absence of the molecular field effects mentioned in Sec. 3.3, E in 
this expression is just the incident wave’s field, and we can use Eq. (28) to calculate the power of the 
wave scattered by a single molecule: 


18 Named after Sir Joseph John (‘JJ’) Thomson, the discoverer of the electron — and isotopes as well! He should 
not be confused with his son, G. P. Thomson, who discovered (simultaneously with C. Davisson and L. Germer) 
quantum-mechanical wave properties of the same electron. 

19 In the Gaussian units, this formula looks like r, = q’/mc’ (giving, of course, the same numerical values: for the 
electron, r, * 2.82x10'? cm). This classical quantity should not be confused with the particle’s Compton 
wavelength Ac = 2ah/mc (for the electron, close to 2.24x10°'?m), which naturally arises in quantum 
electrodynamics — see a brief discussion in the next chapter, and also QM Sec. 1.1. 

20 It is fascinating how smartly the relativistic expression mc” sneaked into the result (40)-(41), which was 
obtained using the non-relativistic equation (36) of the particle motion. This was possible because the calculation 
engaged electromagnetic waves, which propagate with the speed of light, and whose quanta (photons), as a result, 
may be frequently treated as relativistic (moreover, ultra-relativistic) particles — see the next chapter. 
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a ZO y) 2 
f= E| « 8.44 
Arc? ° “oe 
Now, using the last form of the definition (39) of the cross-section, we get a very simple result, 
Loa 24 
= a 8.45 
62 c° ea) 


showing that in contrast to Eq. (40), at low frequencies o changes as fast as w’. 


Now let us explore the effect of such Rayleigh scattering on wave propagation in a gas, with a 
relatively low volumic density n. We may expect (and will prove in the next section) that due to the 
randomness of molecule positions, the waves scattered by individual molecules may be treated as 
incoherent ones, so that the total scattering power may be calculated just as the sum of those scattered 
by each molecule. We can use this fact to write the balance of the incident’s wave intensity in a small 
volume dV of length (along the incident wave direction) dz, and area A across it. Since such a segment 
includes ndV = nAdz molecules, and according to Eq. (39), each of them scatters power So = So/A, the 


total scattered power is nodz; hence the incident power’s change is 
dP =-noP dz. (8.46) 


Comparing this equation with the definition (7.213) of the wave attenuation constant, applied to the 
scattering, ”! 

dP = -Qy.4P az . (8.47) 
we see that this effect gives the following contribution to attenuation: Qat = no. From here, using Eq. 
(3.50) to write @ = &(«- 1)/n, where «x is the dielectric constant, and Eq. (45) for o, we get 


(8.48) 


This is the famous Rayleigh scattering formula, which in particular explains the colors of blue 
sky and red sunsets. Indeed, through the visible light spectrum, w changes almost two-fold; as a result, 
the scattering of blue components of sunlight is an order of magnitude higher than that of its red 
components. For the air near the Earth’s surface, « — 1 6x10“, and n ~ 2.5x10°° m® — see Sec. 3.3. 
Plugging these numbers into Eq. (48), we see that the effective length /scat = 1/Qscat of scattering is ~30 
km for the blue light and ~200 km for the red light.22 The effective thickness h of the Earth’s 
atmosphere is ~10 km, so that the Sun looks just a bit yellowish during most of the day. However, an 
elementary geometry shows that on sunset, the light has to pass the length / ~ (Reh)'? ~ 300 km to reach 
an Earth-surface observer; as a result, the blue components of the Sun’s light spectrum are almost 
completely scattered out, and even the red components are weakened very substantially. 


?1T am sorry for using the same letter (@) for both the molecular polarizability and the wave attenuation, but both 
notations are traditional. Hopefully, the subscript “scat”, marking @ in the latter meaning, minimizes the 
possibility of confusion. 

22 These values are approximate because both n and («— 1) vary through the atmosphere’s thickness. 
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8.4. Interference and diffraction 


Now let us discuss scattering by objects with a size of the order of, or even larger than 2. For 
such extended objects, the phase difference factors (neglected above) step in, leading in particular to the 
important effects of interference and diffraction. These effects show up not as much in the total power 
of the scattered radiation, as in its angular distribution. It is common to characterize this distribution by 
the differential cross-section defined as 


(8.49) 


incident 


where r is the distance from the scatterer, at which the scattered wave is observed.?3 Both the definition 
and the notation may become more clear if we notice that according to Eq. (26), at large distances (r >> 
a), the numerator of the right-hand side of Eq. (49), and hence the differential cross-section as a whole, 
do not depend on 7, and that its integral over the total solid angle Q = 4z coincides with the total cross- 
section defined by Eq. (39): 


do 1 


7? §5dQ=—— fS,dr-— 20. (8.50) 


4a incident 4n incident r=const incident 


For example, according to Eq. (26), the angular distribution of the radiation scattered by a single 
dipole is rather broad; in particular, in the quasistatic case (43), within the Born approximation, 


2 2 
lcd -(2 ) sin? . (8.51) 


dQ ATE, 


If the wave is scattered by a small dielectric body, with a characteristic size a << 4 (1.e., ka << 1), then 
all its parts re-radiate the incident wave coherently. Hence, we can calculate it similarly, just replacing 
the molecular dipole moment (43) with the total dipole moment of the object — see Eq. (3.45): 


p=PV =(«-l)e,EV, (8.52) 


where V ~ a’ is the body’s volume. As a result, the differential cross-section may be obtained from Eq. 
(51) with the replacement Qno1 > (K—- LV: 


2 2 
ace ‘) («-1)? sin? ©, (8.53) 
5 


i.e. follows the same sin’® law. 


The situation for extended objects, with at least one dimension of the order of (or larger than) the 
wavelength, is different: here we have to take into account the phase shifts between the wave’s re- 
radiation by various parts of the body. Let us analyze this issue first for an arbitrary collection of similar 
point scatterers located at points r;. If the wave vector of the incident plane wave is Ko, the wave’s field 
has the phase factor exp {iko-r} — see Eq. (7.79). At the location r; of the Yi scattering center, this factor 


equals exp {iko-r;}, defining the time dependence of the dipole vector p, and hence of the scattered wave. 


23 Just as in the case of the total cross-section, this definition is also similar to that accepted at the particle 
scattering — see, e.g., CM Sec. 3.5 and QM Sec. 3.3. 
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According to Eq. (17), the scattered wave with a wave vector k (with k = ko) acquires, on its way from 
the source point r; to the observation point r, an additional phase factor exp {ik-(r — r;)}, so that the 
scattered wave field is proportional to 


exp|ik, r, +ik(r— r,)}= gi exp|- i(k—-k,)-r, : (8.54) 
Since the first factor in the last expression does not depend on rj, to calculate the total scattering wave, it 
is sufficient to sum up the last phase factors, exp {—iq-r;}, where the vector 
q=k-k, (8.55) 


has the physical sense of the wave vector change at scattering.** It may look like the phase factor 
depends on our choice of the reference frame. However, according to Eq. (7.42), the average intensity of 
the scattered wave is proportional to E,£,, , i.e. to the following real scalar function of the vector q: 


2 
’ 


(8.56) 


F(q)= [Des Higq-r, }| Dow iq: 7) = Dexptiq-(r, -r,)}=|1@) 


where the complex function 
I(q) = > exp {-iq-r,} (8.57) 
j 


is called the phase sum, and may be calculated in any reference frame without affecting the final result 
given by Eq. (56). 


So, besides the sin’?® factor, the differential cross-section (49) of scattering by an extended 
object is also proportional to the scattering function (56). Its double-sum form is convenient to notice 
that for a system of many (N >> 1) similar but randomly located scatterers, only the terms with j = /’ 
accumulate at summation, so that F(q), and hence do/dQ, scale as N, rather than N’ — thus justifying 
again our treatment of the Rayleigh scattering problem in the previous section. 


Now let us apply Eq. (56) to a simple problem of just two similar small scatterers, separated by a 
fixed distance a: 
2 
F(q)= > exp {iq ‘(r,—r,,)} =2+ exp{-iq,a} + exp{iq,a} = 2(1 + cos q,a) = 4cos* ae , (8.58) 
iJ’ 
where qa = q-a/a is the component of the vector q along the vector a connecting the scatterers. The 
apparent simplicity of this result may be a bit misleading because the mutual plane of the vectors k and 
ko (and hence of the vector q) does not necessarily coincide with the mutual plane of the vectors ko and 
E,, so that the scattering angle 0 between k and kp is generally different from (7/2 — ©) — see Fig. 5. 
Moreover, the angle between the vectors q and a (within their common plane) is one more parameter 
independent of both @ and ©. As a result, the angular dependence of the scattered wave’s intensity (and 
hence do/dQ), which depends on all three angles, may be rather involved, but some of its details are 
irrelevant for the basic physics of interference/diffraction. 


24 Tn quantum mechanics, fq has a very clear sense of the momentum transferred from the scattering object to the 
scattered particle (for example, a photon), and this terminology is sometimes smuggled even into classical 
electrodynamics texts. 
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Fig. 8.5. The angles important for the 
general scattering problem. 


This is why let me consider in detail only the simple cases when the vectors k, Ko, and a all 
reside in the same plane, with ko normal to a — see Fig. 6a. 


(b) 


Fig. 8.6. The 
simplest cases of 
(a) interference and 
(b) diffraction. 


In this case, gq = ksin@, and Eq. (58) is reduced to 


, kasin@ 


F(q) =4cos (8.59) 


This function always has two maxima, at 0 = 0 and @ = z, and, if the product ka is large enough, other 
maxima?> at the special angles 6, that satisfy the simple Bragg condition 


. : . Bragg 
kasin@, = 27, Le.asind, =n. (8.60) — condition 


As Fig. 6a shows, this condition may be readily understood as that of the in-phase addition (the 
constructive interference) of two coherent waves scattered from the two points, when the difference 
between their paths toward the observer, asin@, is equal to an integer number of wavelengths. At each 
such maximum, F' = 4, due to the doubling of the wave amplitude and hence quadrupling its power. 


If the distance between the point scatterers is large (ka >> 1), the first maxima (60) correspond to 
small scattering angles, 0 << 1. For this region, Eq. (59) is reduced to a simple periodic dependence of 
function F on the angle @ Moreover, within the range of small 6, the wave polarization factor sin’® is 
virtually constant, so that the angular dependence of the scattered wave’s intensity, and hence of the 
differential cross-section are also very simple: 
do Young’s 


70 oc F(q) = Acos’ —- (8.61) = 


25 In optics especially, such intensity maxima/minima patterns are called interference fringes. 
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This simple interference pattern is well known from Young’s two-slit experiment.7° (As will be 
discussed in the next section, the theoretical description of the two-slit experiment is more complex than 
that of the Born scattering, but is preferable experimentally because, at such scattering, the wave of 
intensity (61) has to be observed on the backdrop of a much stronger incident wave that propagates in 
almost the same direction, 0 = 0.) 


A very similar analysis of scattering from N > 2 similar, equidistant scatterers, located along the 
same straight line shows that the positions (60) of the constructive interference maxima do not change 
(because the derivation of this condition is still applicable to each pair of adjacent scatterers), but the 
increase of N makes these peaks sharper and sharper. Leaving the quantitative analysis of this system for 
the reader’s exercise, let me jump immediately to the limit N — 0, in which we may ignore the 
scatterers’ discreteness. The resulting pattern is similar to that at scattering by a continuous thin rod (see 
Fig. 6b), so let us first spell out the Born scattering formula for an arbitrary extended, continuous, 
uniform dielectric body. Transferring Eq. (56) from the sum to an integral, for the differential cross- 
section we get 


do _( Geen )sin? @ = La “(- I lK sin? © (8.62) 
dQ \4zx ss 4a 
where /(q) now becomes the phase integral,?’ 
I(q) = | exptiq-r'}d*r', (8.63) 
V 


with the dimensionality of volume. 


Now we may return to the particular case of a thin rod (with both dimensions of the cross- 
section’s area A much smaller than A, but an arbitrary length a), otherwise keeping the same simple 
geometry as for two point scatterers — see Fig. 6b. In this case, the phase integral is just 


+a/2 a 2 = . 9) : 
I(q)=4 [exp{-ig,x")dx' = 4 eas a cle Ms PLA a (8.64) 
-a/2 rag S 
where V = Aa is the volume of the rod, and € is the dimensionless argument defined as 
q,a _ kasin@ 
— es : 8.65 
S 5 5 (8.65) 


The fraction participating in the last form of Eq. (64) is met in physics so frequently that it has deserved 
the special name of the sinc (not “sync”, please!) function (see Fig. 7): 


26 This experiment was described in 1803 by Thomas Young — one more universal genius of science, who also 
introduced the Young modulus in the elasticity theory (see, e.g., CM Chapter 7), besides numerous other 
achievements — including deciphering Egyptian hieroglyphs! It is fascinating that the first clear observation of 
wave interference was made as early as 1666 by another genius, Sir Isaac Newton, in the form of so-called 
Newton’s rings. Unbelievably, Newton failed to give the most natural explanation of his observations — perhaps 
because he was vehemently opposed to the very idea of light as a wave, which was promoted in his times by 
others, notably by Christian Huygens. Due to Newton’s enormous authority, only Young’s two-slit experiments 
more than a century later have firmly established the wave picture of light — to be replaced by the dualistic 
wave/photon picture formalized by quantum electrodynamics (see, e.g., QM Ch. 9), in one more century. 

27 Since the observation point’s position r does not participate in this formula explicitly, the prime sign in r’ could 
be dropped, but I keep it as a reminder that the integral is taken over points r’ of the scattering object. 
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sing 


(8.66) Sinc 


sincg = function 


It vanishes at all points ¢, = mm with integer n, besides such point with n = 0: sincé) = sinc 0 = 1. 


Ls 
1.25 

1 

sing 0.75 
~” OS 
- 0.25 


0 
~ 0.25 Fig. 8.7. The sinc function. 


—0.5 


Clr 


The function F(q) = V’sine’é, given by Eq. (64) and plotted with the red line in Fig. 8, is called 
the Fraunhofer diffraction pattern.?8 


Fig. 8.8. The Fraunhofer diffraction 
pattern (solid red line) and its envelope 
1/€ (dashed red line). For comparison, 
the blue line shows the basic 
interference pattern cos’ é— cf. Eq. (61). 


€/a=kasin 0/22 


Note that it oscillates with the same argument’s period A(kasin@) = 27/ka as the interference 
pattern (61) from two-point scatterers (shown with the blue line in Fig. 8). However, at the interference, 
the scattered wave intensity vanishes at angles 6,’ that satisfy the condition 

kasin@,' | 1 


=n+—, 8.67 
mm ( Ce 


i.e. when the optical paths difference asina@ is equal to a semi-integer number of wavelengths 4/2 = a/k, 
and hence the two waves from the scatterers reach the observer in anti-phase — the so-called destructive 
interference. On the other hand, for the diffraction on a continuous rod the minima occur at a different 
set of scattering angles: 

kasin0, _ 


n, 8.68 
= (8.68) 


28 Tt is named after Joseph von Fraunhofer (1787-1826) — who has invented the spectroscope, developed the 
diffraction grating (see below), and also discovered the dark Fraunhofer lines in Sun’s spectrum. 
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i.e. exactly where the two-point interference pattern has its maxima — please have a look at Fig. 8 again. 
The reason for this relation is that the wave diffraction on the rod may be considered as a simultaneous 
interference of waves from all its elementary fragments, and exactly at the observation angles when the 
rod edges give waves with phases shifted by 27m, the interior points of the rod give waves with all 
phases within this difference, with their algebraic sum equal to zero. As even more visible in Fig. 8, at 
the diffraction, the intensity oscillations are limited by a rapidly decreasing envelope function 1/€ — 
while at the two-point interference, the oscillations retain the same amplitude. The reason for this fast 
decrease is that with each Fraunhofer diffraction period, a smaller and smaller fraction of the road gives 
an unbalanced contribution to the scattered wave. 


If the rod’s length is small (ka << 1, i.e. a << J), then the sinc function’s argument € is small at 
all scattering angles @, so I(q) = V, and Eq. (62) is reduced to Eq. (53). In the opposite limit, a >> A, the 
first zeros of the function /(q) correspond to very small angles @, for which sin® ~ 1, so that the 
differential cross-section is just 


2 2 
a t J 1)?sinc? “SO (8.69) 


i.e. Fig. 8 shows the scattering intensity as a function of the direction toward the observation point — if 
this point is within the plane containing the rod. 


Finally, let us discuss a problem of large importance for applications: calculate the positions of 
the maxima of the interference pattern arising at the incidence of a plane wave on a very large 3D 
periodic system of point scatterers. For that, first of all, let us quantify the notion of 3D periodicity. The 
periodicity in one dimension is simple: the system we are considering (say, the positions of point 
scatterers) should be invariant with respect to the linear translation by some period a, and hence by any 
multiple sa of this period, where s is any integer. Anticipating the 3D generalization, we may require 
any of the possible translation vectors R to that the system is invariant, to be equal sa, where the 
primitive vector a is directed along the (so far, the only) axis of the 1D system. 


Now we are ready for the common definition of the 3D periodicity — as the invariance of the 
system with respect to the translation by any vector of the following set: 


3 
[=] 


where s; are three independent integers, and {a;} is a set of three linearly-independent primitive vectors. 
The set of geometric points described by Eq. (70) is called the Bravais lattice (first analyzed in detail, 
circa 1850, by Auguste Bravais). Perhaps the most nontrivial feature of this relation is that the vectors a; 
should not necessarily be orthogonal to each other. (That requirement would severely restrict the set of 
possible lattices and make it unsuitable for the description, for example, of many solid-state crystals.) 
For the scattering problem we are considering, we will assume that the position r; of each point scatterer 
coincides with one of the points R of some Bravais lattice, with a given set of primitive vectors a), so 
that in the basic Eq. (57), the index / is coding the set of three integers {s1, 52, 53}. 


Now let us consider a similarly defined Bravais lattice, but in the reciprocal (wave-number) 
space, numbered by three independent integers {t), fy, t3}: 


3 
Q=>t,b,, with b,, =22—+—* (8.71) 


m=1 
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where in the last expression, the indices m, m’, and m” are all different. This is the so-called reciprocal 
lattice, which plays an important role in all physics of periodic structures, in particular in the quantum 
energy-band theory.?? To reveal its most important property, and thus justify the above introduction of 
the primitive vectors b,,, let us calculate the following scalar product: 


a 3 a,xa_, 3 a, -(a /xa,,) 

R-Q= ) s,t,a,-b,, =22 > s,t,a,-—* —*\  =22) st, ——*—5.._ (8.72) 
> 2 a, -(a,,» xa) 2 an, -(a,,» xa) 

Applying to the numerator of the last fraction the operand rotation rule of vector algebra,*° we see that 

it is equal to zero if / # m, while for / = m the whole fraction is evidently equal to 1. Thus the double 

sum (72) is reduced to a single sum: 


3 3 
R-Q=22)'s,t,=22) n,, (8.73) 
1=1 [=] 
where each of the products 7; = s/t; is an integer, and hence their sum, 


3 
n= Yin, = Gib) Si haeSalas (8.74) 
[=] 


is an integer as well, so that the main property of the direct/reciprocal lattice couple is very simple: 
R-Q=2m, and hence exp{- iR- Q} =I, (8.75) 


Now returning to the scattering function (56) for a Bravais lattice of point scatters, we see that if 
the vector q = k — ko coincides with any vector Q of the reciprocal lattice, then all terms of the phase 
sum (57) take their largest possible values (equal to 1), and hence the sum as the whole is largest as 
well, giving a constructive interference maximum. This equality, q = Q, where Q is given by Eq. (71), is 
called the von Laue condition (named after Max von Laue) of the constructive interference; it is, in 
particular, the basis of the whole field of the X-ray crystallography of solids and polymers — the main 
tool for revealing their atomic/molecular structure.*! 


In order to recast the von Laue condition is a more vivid geometric form, let us consider one of 
the vectors Q of the reciprocal lattice, corresponding to a certain integer n in Eq. (75), and notice that if 
that relation is satisfied for one point R of the direct Bravais lattice (70), i.e. for one set of the integers 
{s1, 2, 53}, it is also satisfied for a 2D system of other integer sets, which may be parameterized, for 
example, by two integers S; and S): 


8, =8,+S,t,, 58, =S8,+Sot,, 8; =8,;—-S,t,-Syt. (8.76) 
Indeed, each of these sets has the same value of the integer n, defined by Eq. (74), as the original one: 
n'=st, +s,t, +5,t, =(s,+S,t,)t, +(s, + Spt; )t, +(s, —S\t, —S,t, )t, =n. (8.77) 


Since according to Eq. (75), the vector of the distance between any pair of the corresponding points of 
the direct Bravais lattice (70), 


29 See, e.g., QM Sec. 3.4, where several particular Bravais lattices R, and their reciprocals Q, are considered. 

30 See, e.g., MA Eq. (7.6). 

3! For more reading on this important topic, I can recommend, for example, the classical monograph by B. 
Cullity, Elements of X-Ray Diffraction, 2" ed., Addison-Wesley, 1978. (Note that its title uses the alternative 
name of the field, once again illustrating how blurry the boundary between the interference and diffraction is.) 


Chapter 8 Page 18 of 38 


Essential Graduate Physics EM: Classical Electrodynamics 


AR = AS\t,a, + AS,t,a, —(AS,t, + AS,t, Ja,, (8.78) 


satisfies the condition AR-Q = 2zAn = 0, this vector is normal to the (fixed) vector Q. Hence, all the 
points corresponding to the 2D set (76) with arbitrary integers S; and S2, are located on one geometric 
plane, called the crystal (or “lattice”) plane. In a 3D system of N >> 1 scatterers (such as N ~10”° atoms 
in a ~l1-mm’ solid crystal), with all linear dimensions comparable, such a plane contains ~N*? >> 1 
points. As a result, the constructive interference peaks are very sharp. 


Now rewriting Eq. (75) as a relation for the vector R’s component along the vector Q, 


Byam where Ry =R-ny “Re and Q =|Q 


(8.79) 


we see that the parallel crystal planes corresponding to different numbers n (but the same Q) are located 
in space periodically, with the smallest distance 


ge". (8.80) 


Q 
so that the von Laue condition q = Q may be rewritten as the following rule for the possible magnitudes 


of the scattering vector q = k — Ko: 


27m 
=—. 8.81 
T— (8.81) 


Figure 9a shows the diagram of the three wave vectors k, ko, and q, taking into account the 
elastic scattering condition |k| = |ko| = 4 = 27/A. From the diagram, we immediately get the famous 
Bragg rule>2 for the (equal) angles ~ = 0/2 between the crystal plane and each of the vectors k and Ko: 


Le. 2dsina=nd. (8.82) 


The physical sense of this relation is very simple — see Fig. 9b drawn in the “direct” space of the radius- 
vectors r, rather than in the reciprocal space of the wave vectors, as Fig. 9a. It shows that if the Bragg 
condition (82) is satisfied, the total difference 2dsina@ of the optical paths of two waves, partly reflected 
from the adjacent crystal planes, is equal to an integer number of wavelengths, so these waves interfere 
constructively. 


(a) (b) 


plane’s 
irection 
directio d 


crystal 


dsina dsina plane 


Fig. 8.9. Deriving the Bragg rule: (a) from the von Laue condition (in the reciprocal space), and 
(b) from a direct-space diagram. Note that the scattering angle O equals 2a. 


32 Named after Sir William Bragg and his son, Sir William Lawrence Bragg, who were the first to demonstrate (in 
1912) the X-ray diffraction by atoms in crystals. The Braggs’ experiments have made the existence of atoms 
(before that, a somewhat hypothetical notion ignored by many physicists) indisputable. 
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Finally, note that the von Laue and Bragg rules, as well as the similar condition (60) for the 1D 
system of scatterers, are valid not only in the Born approximation but also follow from any adequate 
theory of scattering, because the phase sum (57) does not depend on the magnitude of the wave 
propagating from each elementary scatterer, provided that they are all equal. 


8.5. The Huygens principle 


As the reader could see, the Born approximation is very convenient for tracing the basic features 
of (and the difference between) the phenomena of interference and diffraction. Unfortunately, this 
approximation, based on the relative weakness of the scattered wave, cannot be used to describe more 
typical experimental implementations of these phenomena, for example, Young’s two-slit experiment, 
or diffraction on a single slit or orifice — see, e.g. Fig. 10. Indeed, at such experiments, the orifice size a 
is typically much larger than the light’s wavelength A, and as a result, no clear decomposition of the 
fields to the “incident” and “scattered” waves is possible inside it.*3 


opaque 
screen observation r 
wave angle 
source 
7 


yAg : 
<q > oo ooo 
V ni 


re 
ee orifice 
Fig. 8.10. Deriving the 


Huygens principle. 


However, another approximation called the Huygens (or “Huygens-Fresnel”) principle,*4 is very 
instrumental for the description of such situations. In this approach, the wave beyond the screen is 
represented as a linear superposition of spherical waves of the type (17), as if they were emitted by 
every point of the incident wave’s front that has arrived at the orifice. This approximation is valid if the 
following strong conditions are satisfied: 

A<<a<<r, (8.83) 


where r is the distance of the observation point from the orifice. In addition, as we have seen in the last 
section, at small A/a the diffraction phenomena are confined to angles 0 ~ I/ka ~ A/a << 1. For 
observation at such small angles, the mathematical expression of the Huygens principle, for the complex 
amplitude f(r) of a monochromatic wave f(r, t) = Re[f.e 1], is given by the following simple formula 
ikR 
fi(ry=C | fae) 0" (8.84) 


orifice 


33 Another complaint against the Born approximation is that it does not satisfy the so-called optical (or “forward 
scattering”) theorem relating o to scattering with k = ko. This relation is especially important for quantum- 
mechanical description of particle scattering, and in this series, will be discussed in its QM part (Sec. 3.3). 

34 Named after Christian Huygens (1629-1695) who had conjectured the wave nature of light (which remained 
controversial for more than a century, until T. Young’s experiments), and Augustin-Jean Fresnel (1788-1827) 
who developed a quantitative theory of diffraction, and in particular gave a mathematical formulation of the 
Huygens principle. (Note that Eq. (91), sufficient for the purposes of this course, is not its most general form.) 
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Here f is any transverse component of any of the wave’s fields (either E or H),*> R is the distance 
between point r’ at the orifice and the observation point r (i.e. the magnitude of vector R= r-—r’), and 
C is a complex constant. 


Before describing the proof of Eq. (84), let me carry out its sanity check — which also will give 
us the constant C. Let us see what the Huygens principle gives for the case when the field under the 
integral is a plane wave with the complex amplitude f(z), propagating along axis z, with an unlimited x- 
y front, (i.e. when there is no opaque screen at all), so that in Eq. (84) we should take the whole [x, y] 
plane, say with z’ = 0, as the integration area— see Fig. 11. 


observation 
point r = {0, 0, z} 


“source” , 
point r’ : 
Fig. 8.11. Applying the 
Huygens principle to a 
plane incident wave. 


Then, for the observation point with coordinates x = 0, y = 0, and z > 0, Eq. (84) yields 


expik(x"? +y” aig 
(x? oe +27)” | 


fo(2) = Cf, (0) ax'[ dy’ (8.85) 


Before specifying the integration limits, let us consider the range |x’ |, | y’| << z. In this range, the 
square root participating in Eq. (85) twice, may be approximated as 


2 2 We 2 2 92 92 
+ + 
(or ey? oat) a a{t42 aA = (142 bd Jeez ax, (8.86) 


z 22 2z 


At z >> A, the denominator of Eq. (85) is a much slower function of x’ and y’ than the exponent in the 
nominator, and in the former case, it is sufficient (as we will check a posteriori) to keep just the main, 
first term of expansion (86). With that, Eq. (85) becomes 

elke ik(x"? +y") elkz 
Fo(2) = Cfo (0) | dx'| dy’ exp = Cf Ly, (8.87) 
Zz Zz Zz : 


where J, and J, are two similar integrals; for example, 
1, = foxp an’ =(22)" fexplig*uz =(22) [feosle?)aeifsine*)ae], ss 
= [exp = w’-(22) Jexp ic s-(2) [ooslé E+ il sin(é &|, (8.88) 


where €= (k/2z)"*x’. These are the so-called Fresnel integrals. I will discuss them in more detail in the 
next section (see, in particular, Fig. 13), and for now, only one property>° of these integrals is important 


35 The fact that the Huygens principle is valid for any field component should not too surprising. In the limit a >> 
A, the real boundary conditions at the orifice edges are not important; it is only important for the screen that limits 
the orifice, to be opaque. Because of this, the Huygens principle (84) is a part of the so-called scalar theory of 
diffraction. (1 will not have time to discuss the vector theory of these effects, which is more accurate at smaller a — 
see, e.g., Chapter 11 of the monograph by M. Born and E. Wolf, cited at the end of Sec. 7.1.) 
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for us: if taken in symmetric limits [-, +&], both of them rapidly converge to the same value, (7/2)"”, 


as soon as é) becomes much larger than 1. This means that even if we do not impose any exact limits on 
the integration area in Eq. (85), this integral converges to the value 


elke I 1/2 e 1/2 (nx 1/2 2 7 Sui ee 
F(Z) = Cf, 0) F (2) (5 (2) J =(c72) foe ; (8.89) 


due to contributions from the central area with a linear size corresponding to Ag ~ 1, i.e. to 


1/2 
Ax ~ Ay ~ [=| Or wae (8.90) 


so that the net contribution from the front points r’ well beyond the range (90) is negligible.3”7 (Within 
our assumptions (83), which in particular require 2 to be much less than z, the diffraction angle Ax/z ~ 
Ay/z ~ (A/z)'”, corresponding to the important area of the front, is small.) According to Eq. (89), to 
sustain the unperturbed plane wave propagation, f(z) = fkOJe™, the constant C has to be taken equal to 
k/2 i. Thus, the Huygens principle’s prediction (84), in its final form, reads 
k , elkR 25 Huygens 
(Mas) Ie) et (8.91). srinerae 


orifice 


and describes, in particular, the straight propagation of the plane wave (in a uniform medium). 


Let me pause to emphasize how nontrivial this result is. It would be a natural corollary of Eqs. 
(25) (and the linear superposition principle) if all points of the orifice were filled with point scatterers 
that re-emit all the incident waves as spherical waves. However, as it follows from the above example, 
the Huygens principle also works if there is nothing in the orifice but the free space! 


This is why let us discuss a proof of this principle,® based on Green’s theorem (2.207). Let us 
apply it to the function f= f,, where f, is the complex amplitude of a scalar component of one of the 
wave’s fields, which satisfies the Helmholtz equation (7.204), 


(V7 +k? )f,(r) =0, (8.92) 


and the function g = G,, is the temporal Fourier image of the corresponding Green’s function. The latter 
function may be defined, as usual, as the solution of the same equation with the added delta-functional 
right-hand side with an arbitrary coefficient, for example, 


(V2 +k?)G,(r,r’) =—426(r -r’). (8.93) 


Using Eqs. (92) and (93) to express the Laplace operators of the functions f/,, and Go, we may rewrite 
Eq. (2.207) as 


36 See, e.g., MA Eq. (6.10). 

37 This result very is natural, because the function exp{ikR} oscillates fast with the change of r’, so that the 
contributions from various front points are averaged out. Indeed, the only reason why the central part of the plane 
[x’, y’] gives a non-zero contribution (89) to f,(z) is that the phase exponents stop oscillating as (x” + y”) is 
reduced below ~z/k — see Eq. (86). 

38 This proof was given in 1882 by the same G. Kirchhoff whose circuit rules were discussed in Sec. 4.1 and 6.6. 
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[ULE (0!) 4250-1) |- G(r Ef, Hair = ff AG (rr) _ Gute) Le ler, (8.94) 
nN 


: ¢ on 


where n is the outward normal to the surface S limiting the integration volume V. Two terms on the left- 
hand side of this relation cancel, so that after swapping the arguments r and r’, we get 


0G, (r' 0G, (r' 
-47 f(r) = fc Sle - g(t ne sa (8.95) 


This relation is only correct if the selected volume V includes the point r (otherwise we would 
not get its left-hand side from the integration of the delta function), but does not include the genuine 
source of the wave — otherwise, Eq. (92) would have a non-zero right-hand side. Now let r be the field 
observation point, V be all the source-free half-space (for example, the space right of the screen in Fig. 
10), so that S is the surface of the screen, including the orifice. Then the right-hand side of Eq. (95) 
describes the field (at the observation point r) induced by the wave passing through the orifice points r’. 
Since no waves are emitted by the opaque parts of the screen, we can limit the integration by the orifice 
area.3? Assuming also that the opaque parts of the screen do not re-emit the waves “radiated” by the 
orifice, we can take the solution of Eq. (93) to be the retarded potential for the free space:4° 

elkR 
G,(rr') = a (8.96) 


Plugging this expression into Eq. (82), we get 


ikR ikR F 
-4rf,(r= § oral : eer. (8.97) 


R R on' 


orifice 


This is the so-called Kirchhoff (or “Fresnel-Kirchhoff”’) integral. (Again, with the integration 
extended over al/ boundaries of the volume V, this would be an exact mathematical result.) Now, let us 
make two additional approximations. The first of them stems from Eq. (83): at ka >> 1, the wave’s 
spatial dependence in the orifice area may be represented as 


J.,(r') = (a slow function of r') x exp {ik, -r’}, (8.98) 
where “slow” means a function that changes on the scale of a rather than J. If, also, kR >> 1, then the 
differentiation in Eq. (97) may be, in both instances, limited to the rapidly changing exponents, giving 

ikR 
f(rd?r'. (8.99) 


40 f(r)= $ i(k+k,)-n'5 


orifice 


R 


Second, if all observation angles are small, we may take k-n’ = ko-:n’ ~ —k. With that, Eq. (99) is reduced 
to the Huygens principle in its form (91). 


39 Actually, this is a nontrivial point of the proof. Indeed, it may be shown that the exact solution of Eq. (94) 
identically is equal to zero if f(r’) and Of(r’)/On’ vanish together at any part of the boundary, of a non-zero area. A 
more careful analysis of this issue (it is the task of the formal vector theory of diffraction, which I will not have 
time to pursue) confirms the validity of the described intuition-based approach at a >> /. 

40 Tt follows, e.g., from Eq. (16) with a monochromatic source q(t) = g,exp{-iat}, with the amplitude g, = 4zé 
that fits the right-hand side of Eq. (93). 
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It is clear that the principle immediately gives a very simple description of the interference of 
waves passing through two small holes in the screen. Indeed, if the holes’ sizes are negligible in 
comparison with the distance a between them (though still are much larger than the wavelength!), Eq. 
(91) yield 


(r)=ce*% +ce%%, with c,, =,,4,,/27iR,>, (8.100) 
7) 1 2 1,2 1,2 751;2 1,2 


where R) 2 are the distances between the holes and the observation point, and 4; 2 are the hole areas. For 
the wave intensity, Eq. (100) gives 


Sepef, =|e,|’ +|e,|’ + J\e,||c,|cos[A(R, —R,)+o], where g=arge, —argc,. (8.101) 


The first two terms in the last expression clearly represent the intensities of the partial waves passed 
through each hole, while the last one is the result of their interference. The interference pattern’s 


contrast ratio = 2 
Sa (es | (8.102) 
S min le,|-]e,| 


is the largest (infinite) when both waves have equal amplitudes. 


The analysis of the interference pattern is simple if the line connecting the holes is perpendicular 
to wave vector k ~ Ko — see Fig. 6a. Selecting the coordinate axes as shown in that figure, and using for 
the distances R, 2 the same expansion as in Eq. (86), for the interference term in Eq. (101) we get 


cos|k(R, —R,)+ 9] cos Mp), (8.103) 
Zz 


This means that the term does not depend on y, i.e. the interference pattern in the plane of constant z is a 
set of straight, parallel strips, perpendicular to the vector a, with the period given by Eq. (60), i.e. by the 
Bragg law.4! This result is strictly valid only at y° << z’; it is straightforward to use the next term in the 
Taylor expansion (73) to show that farther on from the interference plane y = 0, the strips start to 
diverge. 


8.6. Fresnel and Fraunhofer diffraction patterns 


Now let us use the Huygens principle to analyze a slightly more complex problem: plane wave’s 
diffraction on a long, straight slit of a constant width a (Fig. 12). According to Eq. (83), to use the 
Huygens principle for the problem’s analysis we need to have 2 << a << z. Moreover, the simple 
version (91) of the principle is only valid for small observation angles, | x | << z. Note, however, that the 
relation between two dimensionless parameters of the problem, z/a and a/A, which are both much less 
than 1, is so far arbitrary; as we will see in a minute, this relation determines the type of the observed 
diffraction pattern. 


41 The phase shift y vanishes at the normal incidence of a plane wave on the holes. Note, however, that the 
spatial shift of the interference pattern following from Eq. (103), Ax = -(z/ka)@, is extremely convenient for the 
experimental measurement of the phase shift between two waves, especially if it is induced by some factor (such 
as insertion of a transparent object into one of the interferometer’s arms) that may be turned on/off at will. 


Chapter 8 Page 24 of 38 


Essential Graduate Physics EM: Classical Electrodynamics 


incident 4 
wave 


diffracted 
screen with wave 
a slit 


+a/2 


observation 
plane 


Fig. 8.12. Diffraction on a slit. 


Let us apply Eq. (91) to our current problem (Fig. 12), for the sake of simplicity assuming the 
normal wave incidence, and taking z’ = 0 at the screen plane: 


koe cexplit|(x— x’) + yl + || 
fy(%2) = fo — | ade’ [ay : ——s (8.104) 
2Hit,  “o l(x-x’) ty? 42) 
where fo = fa(x’, 0) = const is the incident wave’s amplitude. This is the same integral as in Eq. (85), 


except for the finite limits for the integration variable x’, and may be simplified similarly, using the 
small-angle condition (x—x’? + yp? <<7: 


ikz +a/2 +00 2 2 2 ikz 
Al(x—x') +y' k 
ful2)* fy — J ax! [dye ema oe | eee 0s) 
a ae i ee 2z 2ni Zz 
The integral over y’ is the same as in the last section: 
+00 Sponty Day 1/2 
1, = [exp ~~ ay { 7) (8.106) 


but the integral over x’ is more general, because of its finite limits: 


+a/2 : r\2 
1,= | exp tee gr (8.107) 


-a/2 
It may be simplified in the following two (opposite) limits. 
(i) Fraunhofer diffraction takes place when z/a >> a/A — the relation which may be rewritten 


either as a << (zA)'”, or as ka’ <<z. In this limit, the ratio kx ”/z is negligibly small for all values of x’ 
under the integral, and we can approximate it as 


+a/2 5 2 2 +a/2 . 2 
—2 ry 4 —2 f 
I. = | exp ike ie ) a & } exp ithe? 222") a Pe 
-a/2 2z —a/2 2z 
(8.108) 
ikx? *4? ikxx') ,,  2z ikx’ |. kxa 
= exp } exps — dx' = —exp sin : 
PS aE Z kx 22 2z 


so that Eq. (105) yields 
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ke 22(2niz\'" ikx’ |. kxa 
JZ) e sin —., 8.109 
Pee: Zz) Ti ai Zz =( k ) xp ) ( ) 


and hence the relative wave intensity is 
Fraunhofer 


S(x,2Z) _ | fo (%,2) (8.110) diffraction 
S, fo 2 pattern 


2 


where So is the intensity of the incident wave, and 0 = x/z << | is the observation angle. Comparing this 
expression with Eq. (69), we see that this diffraction pattern is exactly the same as that for a similar 
(uniform, 1D) object in the Born approximation — see the red line in Fig. 8. Note again that the angular 
width 60 of the Fraunhofer pattern is of the order of 1/ka, so that its linear width dx = z6@ is of the order 
of z/ka ~ zA/a.*? Hence the condition of the Fraunhofer approximation’s validity may be also represented 
as a << Ox. 


(ii) Fresnel diffraction. In the opposite limit of a relatively wide slit, with a >> dx = zd0~ z/ka ~ 
zAla, i.e. ka’ >> z, the diffraction patterns at two edges of the slit are well separated. Hence, near each 
edge (for example, near x’ =—a/2) we may simplify Eq. (107) as 


L(x) ii exp sex) dx' -(2) fexplig? lac, (8.111) 


i (k/2z)!?(x+a/2) 


and express it via the special functions called the Fresnel integrals: 


C()= (2) [oos(¢* ae, S(S)= (2) [sin(g ae, (8.112) shied 


0 0 


whose plots are shown in Fig. 13a. As was mentioned above, at large values of their argument (€), both 
functions tend to . 


(b) 
1.5 T T T 
& > +0 
ik 4 
S(E\+% 
0.57 7 
§ > 
ok 4 
05 \ | ! 
- 0.5 0 0.5 1 1.5 
(6) +% 


Fig. 8. 13. (a) The Fresnel integrals and (b) their parametric representation. 


42 Note also that since in this limit ka’ << z, Eq. (97) shows that even the maximum value S(0, z) of the diffracted 
wave’s intensity is much lower than that (So) of the incident wave. This is natural because the incident power Soa 
per unit length of the slit is now distributed over a much larger width dx >> a, so that S(0, z) ~ Sp (a/dx) << Spo. 
43 Slightly different definitions of these functions, affecting the constant factors, may also be met in literature. 
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Plugging this expression into Eqs. (105) and (111), for the diffracted wave intensity, in the 
Fresnel limit (i.e. at | x + a/2 |<<a), we get 


S(x,Z) = 1 '( k ) [+ | aE (z x | F (8.113) 
So 2 22 


A plot of this function (Fig. 14) shows that the diffraction pattern is very peculiar: while in the “dark” 
region x < —a/2 the wave intensity fades monotonically, the transition to the “bright” region within the 
gap (x > —a/2) is accompanied by intensity oscillations, just as at the Fraunhofer diffraction — cf. Fig. 8. 


15 
1 : 
Ss ‘ 
Sy 0.5 
P Fig. 8.14. The Fresnel 
—5 0 5 10 diffraction pattern. 


(k/2z)'"?(x+a/2) 


This behavior, which is described by the following asymptotes, 


: De 1/2 

= (eae). fore=(=| Grea 

ca I 
4né*’ 


is essentially an artifact of “observing” just the wave intensity (i.e. its real amplitude) rather than its 
phase as well. Indeed, as may be seen even more clearly from the parametric representation of the 
Fresnel integrals, shown in Fig. 13b, these functions oscillate similarly at large positive and negative 
values of their argument. (This famous pattern is called either the Euler spiral or the Cornu spiral.) 
Physically, this means that the wave diffraction at the slit edge leads to similar oscillations of its phase 
at x < —a/2 and x > —a/2; however, in the latter region (i.e. inside the slit) the diffracted wave overlaps 
the incident wave passing through the slit directly, and their interference reveals the phase oscillations, 
making them visible in the observed intensity as well. 


(8.114) 
for € > —oo, 


Note that according to Eq. (113), the linear scale dx of the Fresnel diffraction pattern is of the 
order of (2z/k)'”, i.e. is complies with the estimate given by Eq. (90). If the slit is gradually narrowed so 
that its width a becomes comparable to dx,‘+ the Fresnel diffraction patterns from both edges start to 
“collide” (interfere). The resulting wave, fully described by Eq. (107), is just a sum of two contributions 
of the type (111) from both edges of the slit. The resulting interference pattern is somewhat complicated, 
and only when a becomes substantially less than dx, it is reduced to the simple Fraunhofer pattern (110). 


44 Note that this condition may be also rewritten as a ~ Ox, i.e. z/a ~ a/A. 
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Of course, this crossover from the Fresnel to Fraunhofer diffraction may be also observed, at fixed 
wavelength / and slit width a, by increasing z, i.e. by measuring the diffraction pattern farther and 
farther away from the slit. 


Note also that the Fraunhofer limit is always valid if the diffraction is measured as a function of 
the diffraction angle @ alone. This may be done, for example, by collecting the diffracted wave with a 
“positive” (converging) lens and observing the diffraction pattern in its focal plane. 


8.7. Geometrical optics placeholder 


I would not like the reader to miss, behind all these details, the main feature of the Fresnel 
diffraction, which has an overwhelming practical significance. Namely, besides narrow diffraction 
“cones” (actually, parabolic-shaped regions) with the lateral scale dx ~ (Az)'”, the wave far behind a slit 
of width a >> A, ox, reproduces the field just behind the slit, i.e. reproduces the unperturbed incident 
wave inside it, and has a negligible intensity in the shade regions outside it. An evident generalization of 
this fact is that when a plane wave (in particular an electromagnetic wave) passes any opaque object of a 
large size a >> A, it propagates around it, by distances z up to ~a’/A, along straight lines, with virtually 
negligible diffraction effects. This fact gives the strict foundation for the notion of the wave ray (or 
beam), as the line perpendicular to the local front of a quasi-plane wave. In a uniform media such a ray 
follows a straight line,4> but it refracts in accordance with the Snell law at the interface of two media 
with different values of the wave speed v, 1.e. different values of the refraction index. The concept of 
rays enables the whole field of geometric optics, devoted mostly to ray tracing in various (sometimes 
very sophisticated) optical systems. 


This is why, at this point, an E&M course that followed the scientific logic more faithfully than 
this one, would give an extended discussion of the geometric and quasi-geometric optics, including (as a 
minimum**) such vital topics as 


- the so-called /ensmaker’s equation expressing the focus length fof a lens via the curvature radii 
of its spherical surfaces and the refraction index of the lens material, 

- the thin lens formula relating the image distance from the lens via f and the source distance, 

- the concepts of basic optical instruments such as glasses, telescopes, and microscopes, 

- the concepts of the spherical, angular, and chromatic aberrations (image distortions). 


However, since I have made a (possibly, wrong) decision to follow the common tradition in 
selecting the main topics for this course, I do not have time/space left for such discussion. Still, I am 
using this “placeholder” pseudo-section to relay my deep conviction that any educated physicist has to 
know the geometric optics basics. If the reader has not been exposed to this subject during their 
undergraduate studies, I highly recommend at least browsing one of the available textbooks.*7 


45 Tn application to optical waves, this notion may be traced back to at least the work by Hero (a.k.a. Heron) of 
Alexandria (circa 170 AD). Curiously, he correctly described light reflection from one or several plane mirrors, 
starting from the completely wrong idea of light propagation from the eye of the observer to the observed object. 
46 Admittedly, even this list leaves aside several spectacular effects, including such a beauty as conical refraction 
in biaxial crystals — see, e.g., Chapter 15 of the textbook by M. Bom and E. Wolf, cited in the end of Sec. 7.1. 

47 My top recommendation for that purpose would be Chapters 3-6 and Sec. 8.6 in Born and Wolf. A simpler 
alternative is Chapter 10 in G. Fowles, Introduction to Modern Optics, 2™ ed., Dover, 1989. Note also that the 
venerable field of optical microscopy is currently revitalized by holographic/tomographic methods, using the 
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8.8. Fraunhofer diffraction from more complex scatterers 


So far, our quantitative analysis of diffraction has been limited to a very simple geometry — a 
single slit in an otherwise opaque screen (Fig. 12). However, in the most important Fraunhofer limit, z 
>> ka’, it is easy to get a very simple expression for the plane wave diffraction/interference by a plane 
orifice (with a linear size scale a) of arbitrary shape. Indeed, the evident 2D generalization of the 
approximation (106)-(107) is 


ee ik|(x ai (y- ee 


orifice (8.1 15) 
. 2 2 ’ ’ 
~ exp He) | exp : kxx i kyy java 
2z orifice Zz Zz 
so that besides the inconsequential total phase factor, Eq. (105) is reduced to 
f(p«fy Jexpt- ix-p'}d’p'= f, JT’) exp{-ix-p’ld’p’. (8.116) 


orifice all screen 


Here the 2D vector « (not to be confused with wave vector k, which is virtually perpendicular to «!) is 
defined as 

Kako eq=k-k,, (8.117) 

Z 

and p = {x, y} and p’= {x’, y’} are 2D radius-vectors in the, respectively, observation and orifice planes 
— both nearly normal to the vectors k and ko.*8 In the last form of Eq. (116), the function 7(p’) describes 
the screen’s transparency at point p’, and the integral is over the whole screen plane z’ = 0. (Though the 
two forms of Eq. (116) are strictly equivalent only if 7(p’ ) is equal to either 1 or 0, its last form may be 
readily obtained from Eq. (91) with f(r’) = T(p’ )fo for any transparency profile, provided that 7(p’ ) is 
any function that changes substantially only at distances much larger than 4 = 2 77/k.) 


From the mathematical point of view, the last form of Eq. (116) is just the 2D spatial Fourier 
transform of the function 7(p’), with the variable « defined by the observation point’s position: p = 
(z/k) « = (zA/2z) «. This interpretation is useful because of the experience we all have with the Fourier 
transform, if only in the context of its time/frequency applications. For example, if the orifice is a single 
small hole, 7(p’) may be approximated by a delta function, so that Eq. (116) yields | f(p) | ~ const. This 
result corresponds (at least for the small diffraction angles @ = p/z, for which the Huygens 
approximation is valid) to a spherical wave spreading from the point-like orifice. Next, for two small 
holes, Eq. (116) immediately gives the interference pattern (103). Let me now use Eq. (116) to analyze 
other simplest (and most important) 1D transparency profiles, leaving a few 2D cases for the reader’s 
exercise. 


(i) A single slit of width a (Fig. 12) may be described by transparency 


scattered wave’s phase information. These methods are especially productive in biology and medicine — see, e.g., 
M. Brezinski, Optical Coherence Tomography, Academic Press, 2006, and G. Popescu, Quantitative Phase 
Imaging of Cells and Tissues, McGraw-Hill (2011). 

48 Note that for a thin uniform plate of the same shape as the orifice we are discussing now, the Born phase 
integral (63) with g << k gives a result functionally similar to Eq. (116). 
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IF for |x'| < a/2, 


0, otherwise. 


x! 


r= | (8.118) 


Its substitution into Eq. (116) yields 


+a/2 


(het (aap sine 4) = sine{ 4) (8.119) 
Z 


-a/2 te 


naturally returning us to Eqs. (64) and (110), and hence to the red lines in Fig. 8 for the wave intensity. 
(Please note again that Eq. (116) describes only the Fraunhofer, but not the Fresnel diffraction!) 


(11) Two infinitely narrow, similar, parallel slits with a larger distance a between them (i.e. the 
simplest model of Young’s two-slit experiment) may be described by taking 


T(p') x a{a'-Z) raf ar+ 8) (8.120) 
2 2 
so that Eq. (116) yields the generic 1D interference pattern, 
T(p) © fo| exp ae + exp nN xc cos a4 255g (8.121) 
2 2 2 22 


whose intensity is shown with the blue line in Fig. 8. 


(iii) In a more realistic model of Young’s experiment, each slit has a width (say, w) that is much 
larger than the light wavelength A, but still much smaller than the slit spacing a. This situation may be 
described by the following transparency function 


1, 
T(p')= >| 
Ey 0, 
for which Eq. (116) yields a natural combination of the results (119) (with a replaced with w) and (121): 


f(r) « sine{ cos St) (8.123) 
Zz 


This is the usual interference pattern, but modulated with a Fraunhofer-diffraction envelope — shown in 
Fig. 15 with the dashed blue line. Since the function sinc”é decreases very fast beyond its first zeros at € 
= +7, the practical number of observable interference fringes is close to 2a/w. 


for 


x'£a/2|<w/2, 


; (8.122) 
otherwise, 


x/(2z/kw) 


Fig. 8.15. Young’s double-slit interference pattern for a finite-width slit. 
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(iv) A structure very useful for experimental and engineering practice is a set of many parallel, 
similar slits, called the diffraction grating.*? If the slit’s width is much smaller than period d of the 
grating, its transparency function may be approximated as 


T(p')< > d(x'=nd), (8.124) 
and Eq. (116) yields 7 
n=to ; n=+00 ; nkxd 
f(p) © Yexptinx,dj= > exp i (8.125) 
n=—0 n=—00 Zz 


This sum vanishes for all values of «,d that are not multiples of 27, so that the result describes 
sharp intensity peaks at the following diffraction angles: 


G2 ale) a ee (8.126) 
z), \k), kd d 


Taking into account that this result is only valid for small angles | 6, | << 1, it may be interpreted 
exactly as Eq. (59) — see Fig. 6a. However, in contrast with the interference (121) from two slits, the 
destructive interference from many slits kills the net wave as soon as the angle is even slightly different 
from each value (60). This is very convenient for spectroscopic purposes because the diffraction lines 
produced by multi-frequency waves do not overlap even if the frequencies of their adjacent components 
are very close. 


Two unavoidable features of practical diffraction gratings make their properties different from 
this simple, ideal picture. First, the finite number N of slits, which may be described by limiting the sum 
(125) to the interval n = [-N/2, +N/2], results in a non-zero spread, 60/0 ~ 1/N, of each diffraction peak, 
and hence in the reduction of the grating’s spectral resolution. (Unintentional variations of the inter-slit 
distance d have a similar effect, so that before the advent of high-resolution photolithography, special 
high-precision mechanical tools had been used for grating fabrication.) 


Second, a finite slit width w leads to the diffraction peak pattern modulation by the sine*(kw@2) 
envelope, similar to the pattern shown in Fig. 15. Actually, for spectroscopic purposes, such modulation 
is sometimes a plus, because only one diffraction peak (say, with m = +1) is practically used, and if the 
frequency spectrum of the analyzed wave is very broad (covers more than one octave), the higher peaks 
produce undesirable hindrance. Because of this reason, w is frequently selected to be equal exactly to 
d/2, thus suppressing each other diffraction maximum. Moreover, sometimes semi-transparent films are 
used to make the transparency function 7(r’) continuous and close to a sinusoidal one: 


T. iA t 
T(p’) =7, +7, cos =T, Ef exp = | exp| =i = }). (8.127) 


Plugging the last expression into Eq. (116) and integrating, we see that the output wave consists of just 3 
components: the direct-passing wave (proportional to 7y) and two diffracted waves (proportional to 7)) 
propagating in the directions of the two lowest Bragg angles, A, = +A/d.°° 


49 The rudimentary diffraction grating effect, produced by the parallel fibers of a bird’s feather, was discovered as 
early as 1673 by James Gregory (who also invented the “Gregorian” telescope — one of the basic designs for 
reflecting telescopes). 
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The same Eq. (116) may be also used to obtain one more general (and rather curious) result, 
called the Babinet principle.°'! Consider two experiments with the diffraction of similar plane waves on 
two “complementary” screens — such that together they would cover the whole plane, without a hole or 
an overlap. (Think, for example, about an opaque disk of radius R and a large opaque screen with a 
round orifice of the same radius.) Then, according to the Babinet principle, the diffracted wave patterns 
produced by these two screens in all directions with 0+ 0 are identical. 


The proof of this principle is straightforward: since the transparency functions produced by the 
screens are complementary: 


T(p') = T,(p')+T,(p') = 1, (8.128) 


and the diffracted wave is (in the Fraunhofer approximation only!) a Fourier transform of 7(p’), which is 
a linear operation, we get 


F(P) + F.(P) = fo (P), (8.129) 


where fo is the wave “scattered” by the composite screen with 7o(p’) = 1, 1.e. the unperturbed initial 
wave propagating in the initial direction (9 = 0). In all other directions, f; = —/, 1.e. the diffracted waves 
are indeed similar besides the difference in sign — which is equivalent to a phase shift by +z. However, it 
is important to remember that the Babinet principle notwithstanding, in real experiments, with screens at 
finite distances, the diffracted waves may interfere with the unperturbed plane wave /fo(p), leading to 
different diffraction patterns in cases | and 2 — see, e.g., Fig. 14 and its discussion. 


8.9. Magnetic dipole and electric quadrupole radiation 


Throughout this chapter, we have seen how many important results may be obtained from Eq. 
(26) for the electric dipole radiation by a small-size source (Fig. 1). Only in rare cases when this 
radiation is absent, for example, if the dipole moment p of the source equals zero (or does not change in 
time — either at all or at the frequency of our interest), higher-order effects may be important. I will now 
discuss the main two of them, quadrupole electric radiation and dipole magnetic radiation. 


In Sec. 2 above, the electric dipole radiation was calculated by plugging the expansion (19) into 
the exact formula (17b) for the retarded vector potential A(r, ). Let us make a more exact calculation, 
by keeping the second term of that expansion as well: 


ir -2) wy ir -t402) = (re on) where t’=t—~. (8.130) 
Vv Vv Vv Vv Vv 


Since the expansion is only valid if the last term in the time argument of j is relatively small, in the 
Taylor expansion of j with respect to that argument we may keep just two leading terms: 


(ee 2) = er) FE) (8.131) 
vy ot! v 


50 Similar tricks are used in the so-called phased-array antennas, broadly used in radar systems and 
radioastronomy, in which electronically controlled mutual phase shifts of microwave signals feeding many similar 
component antennas are used to steer the direction of the resulting narrow beam. For more on this important 
technology, see, e.g. T. Milligan, Modern Antenna Design, 2" ed., Wiley (2005). 

5! Named after Jacques Babinet (1784-1874) who made several important contributions to optics. 
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so that Eq. (17b) yields A = Ag+ A’, where Ag is the electric dipole contribution as given by Eq. (23), 
and A’ is the new term of the next order in the small parameter r’ << r: 


Aeje—2 = ir'e’)(r'-n)d*r". (8.132) 


Just as it was done in Sec. 2, let us evaluate this term for a system of non-relativistic particles 
with electric charges q, and radius vectors r;(¢): 


A'(r,t)= 


q,¥, (r (8.133) 
EP py ie 


Using the “bac minus cab” identity of the vector algebra again,°? the vector operand of Eq. (133) may be 
rewritten as 


4arv 


: im ‘i 1 : 1 ; Ls 
i, (r, m= str, n+ St, (nen, )= Sr, xt,)xnt+—r,(n-F, )+—F,(n-r;, ) 
2 2 
(8.134) 
7 st jxne “le ae), 
2 2 dt 
so that the right-hand side of Eq. (133) may be represented as a sum of two terms, A’ = Am + Ag, where 
A(t) = gE tle)em = i ¢—)m, with m PG x q,;, (t (8.135) 
4arv rv v 
A, (r,t)=—-— at (8.136) 
‘ 8a rv| dt? i 


Comparing the second of Eqs. (135) with Eq. (5.91), we see that m is just the total magnetic 
moment of the source. On the other hand, the first of Eqs. (135) is absolutely similar in structure to Eq. 
(23), with p replaced with (mxn)/v, so that for the corresponding component of the magnetic field it 
gives (in the same approximation r >> 2) a result similar to Eq. (24): 


B, (r,t) = Hv ni 2 en —_ nx{ni( 7 | (8.137) 
4arv Vv Vv 


According to this expression, just as at the electric dipole radiation, the vector B is perpendicular to the 
vector n, and its magnitude is also proportional to sin©, where © is now the angle between the direction 
toward the observation point and the second time derivative of the vector m — rather than p: 


7 Milt “sin. (8.138) 
Vv 


Aa rv 


m 


As the result, the intensity of this magnetic dipole radiation has a similar angular distribution: 


S, =ZH? = = =| al a] in? (8.139) 
(47 v'r) v 


52 If you still need it, see MA Eq. (7.5). 
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- cf. Eq. (26), besides the (generally) different meaning of the angle 0. 


Note, however, that this radiation is usually much weaker than its electric-dipole counterpart. 
For example, for a non-relativistic particle with electric charge g, moving on a trajectory of linear size 
~a, the electric dipole moment is of the order of ga, while its magnetic moment scales as ga’@, where 
is the motion frequency. As a result, the ratio of the magnetic and electric dipole radiation intensities is 
of the order of (a@/v)’, i.e. the squared ratio of the particle’s speed to the speed of the emitted waves — 
that has to be much smaller than 1 for our non-relativistic calculation to be valid. 


The angular distribution of the electric quadrupole radiation, described by Eq. (136), is more 
complicated. To show this, we may add to A, a vector parallel to n (i.e. along the wave’s propagation), 
getting 


LH ee r 2 
A (r,t)> Q\t : where @ = 3r,(n-r, )—nr; ¢, 8.140 
>a) Sa {385 (0-1,)-m7} (8.140) 
because this addition does not give any contribution to the transverse component of the electric and 
magnetic fields, i.e. to the radiated wave. According to the above definition of the vector @, its Cartesian 


components may be represented as 


3 
6,=> 6,n, (8.141) 


| aie jie 
j=l 


where Gj are the elements of the electric quadrupole tensor of the system — see the last of Eqs. (3.4):°3 


ie Brir,—r° Sy), - (8.142) 


Now taking the curl of the first of Eqs. (140) at r >> A, we get 
Electric 


(8 143) quadrupole 


radiation: 
field 


This expression is similar to Eqs. (24) and (137), but according to Eqs. (140) and (142), components of 
the vector @ do depend on the direction of the vector n, leading to a different angular dependence of S,. 


As the simplest example, let us consider the system of two equal point electric charges moving 
symmetrically, at equal distances d(t) << A from a stationary center — see Fig. 16. 


n 
Xx 


q © 4d 
Fig. 8.16. The simplest system emitting 
d(t) dit) electric quadrupole radiation. 


Due to the symmetry of the system, its dipole moments p and m (and hence its electric and 
magnetic dipole radiation) vanish, but the quadrupole tensor (142) still has non-zero elements. With the 
coordinate choice shown in Fig. 16, these elements are diagonal: 


53 Let me hope that the reader has already acquired some experience in the calculation of this tensor’s elements — 
e.g., for the simple systems specified in Problems 3.2-3.4. 
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2 2 
€, =@, =-2qd’, @, =4qd°. (8.144) 


With the x-axis selected within the common plane of the z-axis and the direction n toward the 
observation point (Fig. 16), so that n, = sin®, n, = 0, and n, = cosO, Eq. (141) yields 


@.=-2qd’sin®, @=0, € =4qd’ cos, (8.145) 


and the vector product in Eq. (143) has only one non-vanishing Cartesian component: 
ss 34 ae dad 
nx é’), =n.6'-n,G = -6qsin @cos©[d°(0)] (8.146) 
t 


As a result, the quadrupole radiation intensity, S o Be is proportional to sin’°@cos’®, i.e. vanishes not 
only along the symmetry axis of the system (as the electric-dipole and the magnetic-dipole radiations 
would), but also in all directions perpendicular to this axis, reaching its maxima at © = +7/4. 


For more complex systems, the angular distribution of the electric quadrupole radiation may be 
different, but it may be proved that its total (instant) power always obeys the following simple formula: 


(8.147) 


Let me finish this section by giving, also without proof, one more fact important for some 
applications: due to their different spatial structure, the magnetic-dipole and electric-quadrupole 
radiation fields do not interfere, i.e. the total power of radiation (neglecting the electric-dipole and 
higher multipole terms) may be found as the sum of these components, calculated independently. On the 
contrary, the electric-dipole and magnetic-dipole radiations of the same system they typically interfere 
coherently, so that their radiation fields (rather than powers) should be summed up. 


8.10. Exercise problems 


8.1. Equation (8.8) of the lecture notes obviously has standing-wave solutions 7(7, t) = Re[Csinkr 
exp {—iat}], turning the scalar potential ¢ = y/r into a finite constant at r = 0 and into zero at kr = mm, 
with n = 0, 1, 2,... This fact seems to imply that a metallic cavity of radius R has resonant modes with a 
purely radial electric field E(r) = n,£(r), and the lowest nonvanishing of them, with k = a/R, gives the 
lowest (fundamental) frequency w= vk = m(v/R) of the cavity. Is this conclusion correct? 


8.2. Simplify the Lorentz reciprocity theorem (6.121) for space-localized field sources. Then find 
out what it says about the fields of two compact, well-separated sources of the electric-dipole radiation. 


8.3. In the electric-dipole approximation, calculate the angular distribution and the total power of 
electromagnetic radiation by the hydrogen atom within the following classical model: an electron 
rotates, at a constant distance R, about a much heavier proton. Use this result to calculate the law of a 
gradual reduction of R in time. Finally, evaluate the classical lifetime of the atom, borrowing the initial 
value of R from quantum mechanics: R(0) = rg ¥ 0.53x107° m. 


8.4. A non-relativistic particle of mass m, with electric charge q, is placed into a uniform, time- 
independent magnetic field B. Derive the law of decrease of the particle’s kinetic energy due to its 
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electromagnetic radiation at the cyclotron frequency @. = qB/m. Evaluate the rate of such radiation 
cooling of electrons in a magnetic field of 1 T, and estimate the energy interval in which this result is 
quantitatively correct. 


Hint: The cyclotron motion will be discussed in detail (for arbitrary particle velocities v ~ c) in 
Sec. 9.6 below, but I hope that the reader knows that in the non-relativistic case (v << c) the above 
formula for @ may be readily obtained by combining the 2"’ Newton law mv ,’/R = qv,B for the circular 
motion of the particle under the effect of the magnetic component of the Lorentz force (5.10), and the 
geometric relation v; = R@. (Here v, is the particle’s velocity in the plane normal to the vector B.) 


8.5. A particle with mass m and electric charge g and kinetic energy 7 collides head-on with a 
much more massive particle of charge %q, in free space. Calculate the total energy of electromagnetic 
radiation during this collision, assuming it to much lower than 7. 


8.6. Solve the dipole antenna radiation problem discussed in Sec. 2 (see Fig. 3) for the optimal 
length / = 4/2, assuming that the current distribution in each of its arms is sinusoidal: [(z, t) = Jocos(zz//) 
cosat. *4 


8.7. A plane wave is scattered by a localized object in free space. Relate the differential cross- 
section of the wave’s scattering to the average force it exerts on the object. Use this general relation to 
calculate the force exerted by a plane monochromatic wave on a free non-relativistic particle, compare 
the result with those obtained in Problems 7.4 and 7.5, and discuss the comparison. 


8.8. Use the Lorentz oscillator model of a bound charge, given by Eq. (7.30), to explore the 
transition between the two scattering limits discussed in Sec. 3, and in particular, the resonant scattering 
taking place at @ = @p. In this context, discuss the contribution of scattering to the oscillator’s damping. 


8.9.” A sphere of radius R, made of a material with a uniform permanent electric polarization Po 
and a constant mass density p, is free to rotate about its center. Calculate the average total cross-section 
of scattering, by the sphere, of a linearly polarized electromagnetic wave of frequency a << Ric, 
propagating in free space, in the limit of small wave amplitude, assuming that the initial orientation of 
the polarization vector Po is random. 


8.10. Use Eq. (56) to analyze the interference/diffraction pattern produced by a 
I : tteri t of N simil idistant int traight li ky d 
plane wave’s scattering on a set o similar, equidistant points on a straight line N 
normal to the direction of the incident wave’s propagation — see the figure on the right. d 
Discuss the trend(s) of the pattern in the limit VN > o. 


8.11. Use the Born approximation to calculate the differential cross-section of the plane wave 
scattering by a uniform dielectric sphere of an arbitrary radius R. In the limits KR << 1 and 1 << kR 
(where k is the wave number), analyze the angular dependence of the differential cross-section and 
calculate the total cross-section of scattering. 


54 As was emphasized in Sec. 2, this is a reasonable guess rather than a controllable approximation. The exact 
(rather involved!) theory shows that this assumption gives errors ~5%, depending on the wire’s diameter. 
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8.12. A sphere of radius R is made of a uniform dielectric material, with an arbitrary dielectric 
constant. Derive an exact expression for its total cross-section of scattering of a linearly-polarized low- 
frequency (k << 1/R) wave and compare the result with the solution of the previous problem. 


8.13. Use the Born approximation to calculate the differential cross-section of the plane wave 
scattering on a right, circular cylinder of length / and radius R, for an arbitrary angle of incidence. 


8.14. Formulate the quantitative condition of the Born approximation’s validity for a uniform 
dielectric scatterer, with all linear dimensions of the order of the same scale a. 


8.15. If a scatterer absorbs some part of the incident wave’s power, it may be characterized by an 
absorption cross-section 6, defined similarly to Eq. (39) for the scattering cross-section: 


z 


fea arene 
|E,| /2Z, 


where the numerator is the time-averaged power absorbed is the scatterer. Use two different approaches 
to calculate o, of a very small sphere of radius R << k', 5, made of a nonmagnetic material with an 
Ohmic conductivity o and the high-frequency permittivity <p: = €. Can o, of such a sphere be larger 
than its geometric cross-section 7R°? 


8.16. Use the Huygens principle to calculate the wave’s intensity on the symmetry plane of the 
slit diffraction experiment (i.e. at x = 0 in Fig. 12), for arbitrary ratio z/ka’. 


8.17. A plane wave with wavelength / is normally 
incident on an opaque, plane screen, with a round orifice of 
radius R >> A. Use the Huygens principle to calculate the 
passing wave’s intensity distribution along the system’s 
symmetry axis, at distances z >> R from the screen (see the 
figure on the right), and analyze the result. 


8.18. A plane monochromatic wave is now 
normally incident on an opaque circular disk of radius R 
>> A. Use the Huygens principle to calculate the wave’s 
intensity at a distance z >> R behind the disk’s center 
(see the figure on the right). Discuss the result. 


8.19. Use the Huygens principle to analyze the Fraunhofer diffraction of a plane wave normally 
incident on a square-shaped hole, of size axa, in an opaque screen. Sketch the diffraction pattern you 
would observe at a sufficiently large distance, and quantify the expression “sufficiently large” for this 
case. 


8.20. Use the Huygens principle to analyze the propagation of a monochromatic Gaussian beam 


described by Eq. (7.181), with the initial characteristic width ap >> A, in a uniform, isotropic medium. 
Use the result for a semi-quantitative derivation of the so-called Abbe limit for the spatial resolution of 
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an optical system: Wmin = 4/2sin@, where @ is the half-angle of the wave cone propagating from the 
object, and captured by the system. 


8.21. Within the Fraunhofer approximation, 
analyze the pattern produced by a 1D diffraction 
grating with the periodic transparency profile shown 
in the figure on the right, for the normal incidence of 
a plane, monochromatic wave. 


8.22. N equal point charges are attached, at equal intervals, to a circle 
rotating with a constant angular velocity about its center — see the figure on the 
right. For what values of N does the system emit: 


(1) the electric dipole radiation? 
(ii) the magnetic dipole radiation? 
(111) the electric quadrupole radiation? 


8.23. What general statements can you make about: 


(1) the electric dipole radiation, and 
(ii) the magnetic dipole radiation, 


due to a collision of an arbitrary number of similar classical, non-relativistic particles? 


8.24. Calculate the angular distribution and the total power radiated by a small round loop 
antenna with radius R, fed with ac current /(¢) with frequency @ and amplitude J, into the free space. 


8.25. The orientation of a magnetic dipole m, of a fixed magnitude, is rotating about a certain 
axis with angular velocity @, with the angle @ between them staying constant. Calculate the angular 
distribution and the average power of its radiation (into the free space). 


8.26. Solve Problem 12 (also in the low-frequency limit AR << 1), for the case when the sphere’s 
material has a frequency-independent Ohmic conductivity o, and pt = &, in two limits: 


(i) of a very large skin depth (6, >> R), and 
(11) of a very small skin depth (6, << R). 


8.27. Complete the solution of the problem started in Sec. 9, by calculating the full power of 
radiation of the system of two charges oscillating in antiphase along the same straight line — see Fig. 16. 
Also, calculate the average radiation power for the case of harmonic oscillations, d(t) = a cosa, 
compare it with the case of a single charge performing similar oscillations, and interpret the difference. 


8.28. The system of four alternating charges located at the angles of a square, considered in 
Problem 3.3(i), is now being rotated around the axis normal to their plane and passing through the 
square’s center, with a constant angular frequency w << a/v. Calculate the time-averaged angular 
distribution and the total power of the resulting radiation. 
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Chapter 9. Special Relativity 


This chapter starts with a review of special relativity’s basics, including its very convenient 4-vector 
formalism. This background is then used for the analysis of the relation between the electromagnetic 
field’s values measured in different inertial reference frames moving relative to each other. The results 
enable us to discuss relativistic particle dynamics in the electric and magnetic fields, and the analytical 
mechanics of the particles — and of the electromagnetic field as such. 


9.1. Einstein postulates and the Lorentz transform 


As was emphasized at the derivation of expressions for the dipole and quadrupole radiation in 
the last chapter, they are only valid for systems of non-relativistic particles moving with velocities u 
much lower than c. In order to generalize these results to particles moving with arbitrary u, we need help 
from the relativity theory. Moreover, an analysis of the motion of charged relativistic particles in electric 
and magnetic fields is also a natural part of electrodynamics. This is why I will follow the tradition of 
using this course for a (by necessity, brief) introduction to the special relativity theory. This theory is 
based on the fundamental idea that measurements of physical variables (including the spatial and even 
temporal intervals between two events) may give different results in different reference frames, in 
particular in two inertial frames moving relative to each other translationally (i.e. without rotation), with 
a certain constant velocity v (Fig. 1). 


Fig. 9.1. The translational, uniform 
mutual motion of two reference frames. 


In the non-relativistic (Newtonian) mechanics the problem of transfer between such reference 
frames has a simple solution at least in the limit v << c, because the basic equation of particle dynamics 
(the 2™ Newton law) ! 


mt, =-V, > U(r, -te), (9.1) 
. 


where U is the potential energy of inter-particle interactions, is invariant with respect to the so-called 
Galilean transformation (or just “transform” for short).2 Choosing the coordinates in both frames so that 
their axes x and x’ are parallel to the vector v (as in Fig. 1), the transform may be represented as 


' Let me hope that the reader does not need a reminder that for Eq. (1) to be valid, the reference frames 0 and 0’ 
have to be inertial — see, e.g., CM Sec. 1.2. 

? It had been first formulated by Galileo Galilei, if only rather informally, as early as in 1638 — four years before 
Isaac Newton was born! Note also the very unfortunate term “boost”, used sometimes to describe such 
translational transformations. (It is especially unnatural in the special relativity, not describing accelerations.) In 
my course, this term is avoided, with the equivalent “transform” used instead. 


© K. Likharev 
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Galilean 
(9.24) transform 


and plugging Eq. (2a) into Eq. (1), we get an absolutely similarly looking equation of motion in the 
“moving” reference frame 0’. Since the reciprocal transform, 


x’ =x—v, yey, zg =Z. ret. (9.2b) 


is similar to the direct one, with the replacement of (+v) with (—-v), we may say that the Galilean 
invariance means that there is no “master” (absolute) spatial reference frame in classical mechanics, 
although the spatial and temporal intervals between different instant events are absolute, i.e. reference- 
frame invariant: Ax = Ax’,..., At= At’. 


However, it is straightforward to use Eq. (2) to check that the form of the wave equation 


2 2 2 2 
: 52 ie iz Jreo. (9.3) 


ax? ay? az” ce? Gt” 


describing, in particular, the electromagnetic wave propagation in free space,? is not Galilean-invariant.* 
For the “usual” (say, elastic) waves, which obey a similar equation albeit with a different speed,°> this 
lack of Galilean invariance is natural and is compatible with the invariance of Eq. (1), from which the 
wave equation originates. This is because the elastic waves are essentially the oscillations of interacting 
particles of a certain medium (e.g., an elastic solid), making the reference frame connected to this 
medium, special. So, if the electromagnetic waves were oscillations of a certain special medium (which 
was first called the “luminiferous aether’’® and later aether — or just “ether”), similar arguments might be 
applicable to reconcile Eqs. (2) and (3). 


The detection of such a medium was the goal of the measurements carried out between 1881 and 
1887 (with better and better precision) by Albert Abraham Michelson and Edward Williams Morley, 
which are sometimes called “the most famous failed experiments in physics”. Figure 2 shows a crude 
scheme of these experiments. 


mirror 


semi- 
light transparent 
source 


mirror 


Fig. 9.2. The Michelson- 


Morley experiment. 
detector 


3 The discussions in this chapter and most of the next chapter will be restricted to the free-space (and hence 
dispersion-free) case; some media effects on the radiation by relativistic particles will be discussed in Sec.10.4. 

4 It is interesting that the usual (non-relativistic) Schrédinger equation, whose fundamental solution for a free 
particle is a similar monochromatic wave (albeit with a different dispersion law), is Galilean-invariant, with a 
certain change of the wavefunction’s phase — see, e.g., QM Chapter 1. 

5 See, e.g., CM Secs. 6.5 and 7.7. 

6 In ancient Greek mythology, aether is the clean air breathed by the gods residing on Mount Olympus. 
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A nearly-monochromatic wave from a light source is split into two parts (optimally, of equal 
intensity), using a semi-transparent mirror tilted by angle 7/4 to the incident wave direction. These two 
partial waves are reflected back by two fully-reflecting mirrors and arrive at the same semi-transparent 
mirror again. Here a half of each wave is directed toward the light source (they vanish there without 
affecting the source), but another half is passed toward an intensity detector, forming, with its 
counterpart, an interference pattern similar to that in the Young experiment. Thus each of the interfering 
waves has traveled twice (back and forth) each of two mutually perpendicular “arms” of the 
interferometer. Assuming that the aether, in which light propagates with speed c, moves with speed v <c 
along one of the arms, of length /), it is straightforward (and hence left for the reader’s exercise :-) to get 
the following expression for the difference between the light roundtrip times: 


y) 1, / I(vy 
At = 1/2 ; z|~ ) , (9.4) 
C (1-v?/c?) l-v'/e Cx 


where /; is the length of the second, “transverse” arm of the interferometer (perpendicular to v), and the 
last, approximate expression 1s valid at /, ~ /)=/ and v <<c. 


Since the Earth moves around the Sun with a speed vg ~ 30 km/s ~ 10% c, the arm positions 
relative to this motion alternate, due to the Earth’s rotation about its axis, every 6 hours — see the right 
panel of Fig. 2. Hence if we assume that the aether rests in the Sun’s reference frame, then At (and the 
corresponding shift of the interference fringes), has to change its sign with this half-period as well. The 
same alternation may be achieved, at a smaller time scale, by a deliberate rotation of the instrument by 
m2. In the most precise version of the Michelson-Morley experiment (circa 1887), this shift was 
expected to be close to 0.4 of the interference pattern period. The results of the search for such a shift 
were negative, with the error bar about 0.01 of the period.” 


The most prominent immediate explanation of this zero result® was suggested in 1889 by George 
Francis FitzGerald and (independently and more qualitatively) by H. Lorentz in 1892: as evident from 
Eq. (4), if the longitudinal arm of the interferometer itself experiences the so-called length contraction, 


iv) =, oft] (9.5) 


while the transverse arm’s length is not affected by its motion through the aether, this effect kills the 
shift At. This radical idea received strong support from the proof, in 1887-1905, that the Maxwell 
equations, and hence the wave equation (3), are form-invariant under the so-called Lorentz transform,? 
which in particular describes Eq. (5). For the choice of coordinates shown in Fig. 1, the transform reads 


7 Through the 20" century, the Michelson-Morley-type experiments were repeated using more and more refined 
experimental techniques, always with zero results for the apparent aether motion speed. For example, recent 
experiments using cryogenically cooled optical resonators have reduced the upper limit for such speed to just 
3x107'°c —see H. Miiller et al., Phys. Rev. Lett. 91, 020401 (2003). 

8 The zero result of a slightly later experiment, namely a precise measurement of the torque which should be 
exerted by the moving aether on a charged capacitor, carried out in 1903 by F. Trouton and H. Noble (following 
G. FitzGerald’s suggestion), seconded the Michelson and Morley’s conclusions. 

° The theoretical work toward this result included important contributions by Woldemart Voigt (in 1887), Hendrik 
Lorentz (in 1892-1904), Joseph Larmor (in 1897 and 1900), and Henri Poincaré (in 1900 and 1905). 
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t'+ (v/c? ax Lorentz 
= et (9.6a) transform 


(1 ay 1)? 


It is elementary to solve these equations for the primed coordinates to get the reciprocal transform 


ie En, ght, Gag pl UE 
av fey (a ie} : 


(I will soon represent Eqs. (6) in a more elegant form — see Eqs. (19) below.) 


(9.6b) 


The Lorentz transform relations (6) are evidently reduced to the Galilean transform formulas (2) 
at v’ << c’. However, all attempts to give a reasonable interpretation of these equalities while keeping 
the notion of the aether have failed, in particular because of the restrictions imposed by results of earlier 
experiments carried out in 1851 and 1853 by Hippolyte Fizeau — which were repeated with higher 
accuracy by the same Michelson and Morley in 1886. These experiments have shown that if one sticks 
to the aether concept, this hypothetical medium has to be partially “dragged” by any moving dielectric 
material with a speed proportional to («— 1). Such local drag would be irreconcilable with the assumed 
continuity of the aether. 


In his famous 1905 paper, Albert Einstein suggested a bold resolution of this contradiction, 
essentially removing the concept of the aether altogether.'!° Moreover, he argued that the Lorentz 
transform is the general property of time and space, rather than of the electromagnetic field alone. He 
started with two postulates, the first one essentially repeating the relativity principle formulated a bit 
earlier (in 1904) by H. Poincaré in the following form: 


“the laws of physical phenomena should be the same, whether for an observer fixed or for 

an observer carried along in a uniform movement of translation; so that we have not and 
could not have any means of discerning whether or not we are carried along in such a 
motion.” |! 


The second Einstein postulate was that the speed of light c, in free space, should be constant in 
all reference frames. (This is essentially a denial of the aether’s existence.) 


Then, Einstein showed that the Lorenz transform relations (6) naturally follow from his 
postulates, with a few (very natural) additional assumptions. Let a point source emit a short flash of 
light, at the moment ¢ = ¢’ = 0 when the origins of the reference frames shown in Fig. 1 coincide. Then, 
according to the second of Einstein’s postulates, in each of the frames, the spherical wave propagates 
with the same speed c, i.e. the coordinates of points of its front, measured in the two frames, have to 
obey the following equalities: 

(ct)? —(x? + y? +27) =0, 


(9.7) 
(ct)? —(x"? + y? +2'7) =0. 


10 In hindsight, this was much relief, because the aether had been a very awkward construct to start with. In 
particular, according to the basic theory of elasticity (see, e.g., CM Ch. 7), in order to carry such transverse waves 
as the electromagnetic ones, this medium would need to have a non-zero shear modulus, i.e. behave as an elastic 
solid — rather than as a rarified gas hypothesized initially by C. Huygens. 

!1 Note that though the relativity principle excludes the notion of the special (“absolute”) spatial reference frame, 
its quoted verbal formulation still leaves the possibility of the Galilean “absolute time” ¢ = t’ open. The 
quantitative relativity theory kills this option — see Eqs. (6) and their discussion below. 
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What may be the general relation between the combinations in the left-hand side of these equations — 
not for this particular wave’s front, but in general? A very natural (essentially, the only justifiable) 
choice is 


(ct)? - (0? +? +27) |= £0 let")? -(r? +9 +27)]. (9.8) 


Now, according to the first postulate, the same relation should be valid if we swap the reference frames 
(x <> x’, etc.) and replace v with (-v). This is only possible if f” = 1, so that excluding the option f=—1 
(which is incompatible with the Galilean transform in the limit v/c > 0), we are left with f= +1, i.e. 


(ct)? —(x? +y? +27) =(ct'Y -(x? ty? +2"). (9.9) 
For the line with y= y’=0 andz= z’=0, Eq. (9) is reduced to 
(ct)? —x? =(ct' -x”. (9.10) 


It is very illuminating to interpret this relation as the one resulting from a mutual rotation of the 
reference frames (that now have to include clocks to measure time) on the plane of the coordinate x and 
the so-called imaginary time t = ict — see Fig. 3. 


Fig. 9.3. The Lorentz transform as a mutual rotation 
of two reference frames on the [x, 7] plane. 


Indeed, rewriting Eq. (10) as 
a a a ae (9.11) 


we may consider it as the invariance of the squared radius at the rotation shown in Fig. 3 and described 
by the following geometric relations: 


x=x'cosy —rsiny, 


; (9.12a) 
T=x'sinw +7’ cosy, 
with the reciprocal relations 
x'=xcosy+rtsiny, 
ay . (9.12b) 
7 =-xsinyw +Tcosy. 


So far, the angle yw has been arbitrary. In the spirit of Eq. (8), a natural choice is y = wv), with 
the requirement y(0) = 0. To find this function, let us write the definition of the velocity v of frame 0’, 
as measured in frame 0 (which was implied above): for x’ = 0, x = vt. In the variables x and 7, this means 


x 


j=: (9.13) 


On the other hand, for the same point x’ = 0, Eqs. (12a) yield 
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—| op =—tany. (9.14) 
a 
These two expressions are compatible only if 
sae. (9.15) 
so that 7 
siny = ay = ue ay = IP, cosy = : = : pp =r (9:16) 


(1+ tan? y)” (1-v? /c?) (1+tan?y)”” (1-v? /c?) 


where Zand yare two very convenient and commonly used dimensionless parameters defined as 
1 1 Relativistic 


——— (9.17) parameters 
B and y 


a= (ae 


(The vector B is called the normalized velocity, while the scalar y, the Lorentz factor.)'” 


Using the above relations for y, Eqs. (12) become 
x=y(x'-ifr) r=y(if'+7), (9.18a) 
x'=y(x+ifr) c=y(-ifx+r) (9.18b) 


Now returning to the real variables [x, ct], we get the Lorentz transform relations (6), in a more compact 
form: 


xay(x'+ Bet) yay, z=2', ct=y(ct'+ Br’), (9.19) Lorentz 
transform 
x= y(x- B ct), v=), 2’=2, c= y(ct — Bx), (9.19b) - again 


An immediate corollary of Eqs. (19) is that for 7 to stay real, we need v’ < c’, i.e. that the speed 
of any physical body (to which we could connect a meaningful reference frame) cannot exceed the 
speed of light, as measured in any other meaningful reference frame.!3 


9.2. Relativistic kinematic effects 


Before proceeding to other corollaries of Eqs. (19), let us spend a few minutes discussing what 
these relations actually mean. Evidently, they are trying to tell us that the spatial and temporal intervals 
are not absolute (as they are in the Newtonian space), but do depend on the reference frame they are 
measured in. So, we have to understand very clearly what exactly may be measured — and thus may be 
discussed in a meaningful physics theory. Recognizing this necessity, A. Einstein introduced the notion 
of numerous imaginary observers that may be distributed all over each reference frame. Each observer 


!2 Note the following identities: y” = 1/(1- 8) and (vy? — 1) = B7/(1- B’) = 7B’, which are frequently handy in 
relativity-related algebra. One more function of f, the rapidity g = tanh'f (so that y= ig), is also useful for 
some calculations. 

13 All attempts to rationally conjecture particles moving with v > c (called tachyons) have failed — so far, at least. 
Possibly the strongest objection against their existence is the fact that the tachyons could be used to communicate 
back in time, thus violating the causality principle — see, e.g., G. Benford et al., Phys. Rev. D2, 263 (1970). 
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has a clock and may use it to measure the instants of /ocal events, taking place at the observer’s 
location. He also conjectured, very reasonably, that: 


(i) all observers within the same reference frame may agree on a common length measure (“a 
scale’), i.e. on their relative positions in that frame, and synchronize their clocks,!4 and 


(ii) the observers belonging to different reference frames may agree on the nomenclature of 
world events (e.g., short flashes of light) to which their respective measurements refer. 


Actually, these additional postulates have been already implied in our “derivation” of the 
Lorentz transform in Sec. 1. For example, by the set {x, y, z, t} we mean the results of space and time 
measurements of a certain world event, about that all observers belonging to frame 0 agree. Similarly, 
all observers of frame 0’ have to agree about the results {x’, y’, z’, t’}. Finally, when the origin of frame 
0’ passes by some sequential points x, of frame 0, the observers in the latter frame may measure its 
passage times ¢; without a fundamental error, and know that all these times belong to x’ = 0. 


Now we can analyze the major corollaries of the Lorentz transform, which are rather striking 
from the point of view of our everyday (rather non-relativistic) experience. 


(i) Length contraction. Let us consider a thin rigid rod oriented along the x-axis, with its length / 
= X2— x1, where x; are the coordinates of the rod’s ends, as measured in its rest frame 0, at any instant ¢ 
(Fig. 4). What would be the rod’s length /’ measured by the Einstein observers in the moving frame 0’? 


Fig. 9.4. The relativistic length contraction. 


At a time instant ¢’ agreed upon in advance, the observers who find themselves exactly at the 
rod’s ends, may register that fact, and then subtract their coordinates x’). to calculate the apparent rod 
length 7 =x2’—.x,’ in the moving frame. According to Eq. (19a), / may be expressed via this /’ as 


[= x,—x, = 7(x,' + Bet')— (xt Bet!) = (xy - x, = 7T'. (9.20a) 


Hence, the rod’s length, as measured in the moving reference frame is 


(9.20b) 


in accordance with the FitzGerald-Lorentz hypothesis (5). This is the relativistic length contraction 
effect: an object is always the longest (has the so-called proper length 1) if measured in its rest frame. 


14 A posteriori, the Lorenz transform may be used to show that consensus-creating procedures (such as clock 
synchronization) are indeed possible. The basic idea of the proof is that since at v << c, the relativistic corrections 
to space and time intervals are of the order of (v/c)’, they have negligible effects on clocks being brought together 
into the same point for synchronization slowly, with a speed u << c. The reader interested in a detailed discussion 
of this and other fine points of special relativity may be referred to, e.g., either H. Arzeliés, Relativistic 
Kinematics, Pergamon, 1966, or W. Rindler, Introduction to Special Relativity, 2" ed., Oxford U. Press, 1991. 


Chapter 9 Page 7 of 56 


Essential Graduate Physics EM: Classical Electrodynamics 


Note that according to Eqs. (19), the length contraction takes place only in the direction of the relative 
motion of two reference frames. As was noted in Sec. 1, this result immediately explains the zero result 
of the Michelson-Morley-type experiments, so that they give very convincing evidence (if not 
irrefutable proof) of Eqs. (18)-(19). 


(ii) Time dilation. Now let us use Eqs. (19a) to find the time interval At, as measured in some 
reference frame 0, between two world events — say, two ticks of a clock moving with another frame 0’ 
(Fig. 5), i.e. having fixed values of x’, y’, and z’. 


_ 


Let the time interval between these two events, measured in the clock’s rest frame 0’, be At’ = f2’ 
— t,’. At these two moments, the clock would fly by certain two Einstein’s observers at rest in frame 0, 
so that they can record the corresponding moments ¢;,2 shown by their clocks, and then calculate At as 
their difference. According to the last of Eqs. (19a), 


po x! 
Vv Fig. 9.5. The relativistic time dilation. 


cAt = ct, —ct, = y|(ct,! + Bx’) — (ct, + Bc’) ]= eA’, (9.21a) 
so that, finally, 


(9.21b) 


This is the famous relativistic time dilation (or “dilatation”) effect: a time interval is Jonger if measured 
in a frame (in our case, frame 0) moving relative to the clock, while that in the clock’s rest frame is the 
shortest possible — the so-called proper time interval. 


This rather counter-intuitive effect is the everyday reality in experiments with high-energy 
elementary particles. For example, in a typical (and by no means record-breaking) experiment carried 
out in Fermilab, a beam of charged 200 GeV pions with v= 1,400 traveled a distance of / = 300 m with 
the measured loss of only 3% of the initial beam intensity due to the pion decay (mostly, into muon- 
neutrino pairs) with the proper lifetime fo ~ 2.56x10° s. Without the time dilation, only an exp {-l/cto} 
~10"'” fraction of the initial pions would survive, while the relativity-corrected number, exp{-l/ct} = 
exp {-l/c7to} ~ 0.97, was in full accordance with experimental measurements. 


As another example, the global positioning systems (say, the GPS) are designed with the account 
of the time dilation due to the velocity of their satellites (and also some gravity-induced, i.e. general- 
relativity corrections, which I would not have time to discuss) and would give large errors without such 
corrections. So, there is no doubt that time dilation (21) is a reality, though the precision of its 
experimental tests I am aware of!> has been limited to a few percent, because of the almost unavoidable 
involvement of less controllable gravity effects — which provide a time interval change of the opposite 
sign in most experiments near the Earth’s surface. 


15 See, e.g., J. Hafele and R. Keating, Science 177, 166 (1972). 
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Before the first reliable observation of time dilation (by B. Rossi and D. Hall in 1940), there had 
been serious doubts about the reality of this effect, the most famous being the twin paradox first posed 
(together with an immediate suggestion of its resolution) by P. Langevin in 1911. Let us send one of two 
twins on a long space roundtrip with the maximum speed approaching c. Upon his return to Earth, who 
of the twins would be older? The naive approach is to say that due to the relativity principle, not one can 
be (and hence there is no time dilation) because each twin could claim that their counterpart rather than 
them, was moving, with the same speed but in the opposite direction. The resolution of the paradox is 
that one of the twins had to be accelerated to be brought back, and hence the reference frames have to be 
dissimilar: only one of them may stay inertial all the time. As a result, the twin who had been 
accelerated (“actually traveling”) would be younger than their sibling when they finally come together. 
Constructive proof of this conclusion for the particular case of straight-line travel with a piecewise- 
constant acceleration, is simple and hence left for the reader’s exercise. 


(iii) Velocity transformation. Now let us calculate the velocity u of a moving point, as observed 
in reference frame 0, provided that its velocity, as measured in frame 0’, is u’ (Fig. 6). 


Fig. 9.6. The relativistic velocity addition. 


Keeping the usual definition of velocity, but with due attention to the relativity of not only 
spatial but also temporal intervals, we may write 


ua, wa (9.22) 
Plugging in the differentials of the Lorentz transform relations (6a) into these definitions, we get 
’ ’ , ’ u' ‘ 
ee a . Ee 7 ce ny = A = ; oe ~ ; ie p Saree 
with a similar formula for u-;. In the classical limit v/c > 0, these relations are reduced to 
u, =u',+yV, u,=u',, u, =Uu',, (9.24a) 
and may be merged into the familiar Galilean form 
u=u'+V, for v<<c. (9.24b) 


In order to see how unusual the full relativistic rules (23) are at u ~ c, let us first consider a 
purely longitudinal motion, wu, = u, = 0; then!* 


'6 With an account of the identity tanh(a + 5) = (tanha + tanhb)/(1 + tanha tanhd), which readily follows from 
MA Eg. (3.5), Eq. (25) shows that rapidities g@ = tanh’ add up exactly as longitudinal velocities at non- 
relativistic motion, making that notion very convenient for the analysis of transfer between several frames. 
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Longitudinal 


(9.25) velocity 


addition 


where uw = u, and u’ = u’;. Figure 7 shows this u as the function of u’, for several values of the reference 
frames’ relative velocity v. 


a|s 


1 


Fig. 9.7. The addition of longitudinal velocities. 
=i 0 1 
u'/c 


The first sanity check is that if v = 0, i.e. if the reference frames are at rest relative to each other, 


then u = u’, as it should be — see the diagonal straight line in Fig. 7. Next, if magnitudes of u’ and v are 
both below c, so is the magnitude of u. (Also good, because otherwise ordinary particles in one frame 
would be tachyons in the other one, and the theory would be in big trouble.) Now strange things begin: 
even as u’ and v are both approaching c, then u is also close to c, but does not exceed it. As an example, 
if we fired forward a bullet with the relative speed of 0.9c, from a spaceship moving from the Earth also 
at 0.9c, Eq. (25) predicts the speed of the bullet relative to the Earth to be just [(0.9 + 0.9)/(1 + 
0.9x0.9)]c ~ 0.994c < c, rather than (0.9 + 0.9) c = 1.8 c > c as in the Galilean kinematics. Actually, we 
could expect this strangeness, because it is necessary to fulfill the 2" Einstein’s postulate: the 
independence of the speed of light in any reference frame. Indeed, for u’ = +c, Eq. (25) yields u = +c, 
regardless of v. 


In the opposite case of a purely transverse motion, when a point moves across the relative 


motion of the frames (for example, at our choice of coordinates, uw’, = u’ , = 0), Eqs. (23) yield a much 
less spectacular result 


i 
u, = ae SU (9.26) 


This effect comes purely from the time dilation because the transverse spatial intervals are Lorentz- 


invariant. 


In the case when both uv ,’ and u,’ are substantial (but uw,’ is still zero), we may divide Eqs. (23) 


by each other to relate the angles 0 of the point’s propagation, as observed in the two reference frames: 


Chapter 9 
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effect 


sin 0! 


yu’, +v) = y(cosO'+v/u')” 
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This expression describes, in particular, the so-called stellar aberration effect: the dependence of the 
observed direction @ toward a star on the speed v of the telescope’s motion relative to the star — see Fig. 
8. (The effect is readily observable experimentally as the annual aberration due to the periodic change 
of speed v by 2vg = 60 km/s because of the Earth’s rotation about the Sun. Since the aberration’s main 


part is of the first order in vp/c ~ 10%, this effect is very significant and has been known since the early 
1700s.) 


Vv 
> Fig. 9.8. The stellar aberration. 


For the analysis of this effect, it is sufficient to take, in Eq. (27), u’ =, ie. v/u’ = f£, and 
interpret 0’ as the “proper” direction to the star, that would be measured at v = 0.!7 At B<< 1, both Eq. 
(27) and the Galilean result (which the reader is invited to derive directly from Fig. 8), 


a eee (9.28) 
cos0'+ B 
may be well approximated by the first-order term 
APD =0-0'%-fhsinO. (9.29) 


Unfortunately, it is not easy to use the difference between Eqs. (27) and (28), of the second order in f, 
for special relativity’s confirmation, because other components of the Earth’s motion, such as its 
rotation, nutation, and torque-induced precession,!® give masking first-order contributions to the 
aberration. 


Finally, for a completely arbitrary direction of the vector u’, Eqs. (22) may be readily used to 
calculate the velocity’s magnitude. The most popular form of the resulting expression is the following 
expression for the square of the relative velocity (or rather the reduced relative velocity B) of two points, 


, (6, -B,) -|B, -B,| 
G2 <i. 
(1-B, -B, ) 


where Bi 2 = Vi2/c are their normalized velocities as measured in the same reference frame. 


(9.30) 


'7 Strictly speaking, to reconcile the geometries shown in Fig. 1 (for which all our formulas, including Eq. (27), 
are valid) and Fig. 8 (giving the traditional scheme of the stellar aberration), it is necessary to invert the signs of u 
(and hence of sin@’ and cos@’) and v, but as it is evident from Eq. (27), all the minus signs cancel, and the formula 
is valid “as is”. 

18 See, e.g., CM Secs. 4.4-4.5. 
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(iv) The Doppler effect. Let us consider a monochromatic plane wave of some physical nature, 
traveling along the x-axis: 


Ea Rel f, exp{ i(kx — at}| = |f,|cos(kx - at + arg f,,) = Wes 


Its total phase, ‘V = kx — of + arg f, (in contrast to its amplitude | to |— see Sec. 5 below) cannot depend 
on the observer’s reference frame, because the variable f vanishes completely at ‘¥ = a(n + '4) (for all 
integer 1), and such “world events” should be observable in all reference frames. The only way to keep 
Y =’ at all times is to have!? 


cos. (9.31) 


kx -ot =k'x'-o't'. (9.32) 


First, let us use this general relation to consider the Doppler effect in the usual non-relativistic 
mechanical waves, e.g., oscillations of particles of a certain medium. Using the Galilean transform (2), 
we may rewrite Eq. (32) as 

k(x'+vt)-—ot=k*x'-o't. (9.33) 


Since this transform leaves all space intervals (including the wavelength 2 = 2z/k) intact, we can take k 
=k’, so that Eq. (33) yields 
o'=a-ky. (9.34) 


For a dispersion-free medium, the wave number k is the ratio of its frequency @, as measured in 
the reference frame bound to the medium, and the wave velocity vy. In particular, if the wave source 
rests in the medium, we may bind the reference frame 0 to the medium as well, and frame 0’ to the 
wave’s receiver (i.e. v = v;), so that 


k=, (9.35) 


and for the frequency perceived by the receiver, Eq. (34) yields 


o! =o". (9.36) 


v 


Ww 


On the other hand, if the receiver and the medium are at rest in the reference frame 0’, while the wave 
source is bound to the frame 0 (so that v = —v,), Eq. (35) should be replaced with 


ay eee (9.37) 


and Eq. (34) yields a different result: 
o'=a aad ; (9.38) 


Finally, if both the source and detector are moving, it is straightforward to combine these two results to 


get the general relation 
Vy ~ V, 


o' =O (9.39) 


Vw mane 


19 Strictly speaking, Eq. (32) is valid to an additive constant, but for notation simplicity, it may be always made 
equal to zero by selecting (as has already been done in all relations of Sec. 1) the reference frame origins and/or 
clock turn-on times so that at f= 0 and x =0, t’=0 andx’=0 as well. 
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At low speeds of both the source and the receiver, this result simplifies, 


o' = oa(l-f), with B="*—"*, (9.40) 


v 


WwW 


but at speeds comparable to v,, we have to use the more general Eq. (39). Thus, the usual Doppler effect 
is generally affected not only by the relative speed (v,— v;) of the wave’s source and detector but also by 
their speeds relative to the medium in which the waves propagate. 


Somewhat counter-intuitively, for the electromagnetic waves the calculations are simpler 
because for them the propagation medium (aether) does not exist, the wave velocity equals +c in any 
reference frame, and there are no two separate cases: we can always take k = ta’c and k’ = ta’/c. 
Plugging these relations, together with the Lorentz transform (19a), into the phase-invariance condition 
(32), we get 
TEE gO ai, (9.41) 

c 


=e ? y(x' + Bct')— oy 
c 


This relation has to hold for any x’ and t’, so we may require that the net coefficients before these 
variables vanish. These two requirements yield the same equality: 
o'= ay(\F f). (9.42) 


This result is already quite simple, but may be transformed further to be even more illuminating: 


_ = = 1/2 
o! =o EF ol pete , (9.43) 
(i- 5?) (1+ BXl-£) 
At any sign before £, one pair of parentheses cancel, so that?° 
(9.44) 


Thus the Doppler effect for electromagnetic waves depends only on the relative velocity v = fic 
between the wave source and detector — as it should be, given the aether’s absence. At velocities much 
lower than c, Eq. (44) may be approximated as 


14+ 6/2 
molt bl2.. 


aap (9.45) 


i.e. in the first approximation in {= v/c it tends to the corresponding limit (40) of the usual Doppler 
effect. 


If the wave vector k is tilted by angle @ to the vector v (as measured in frame 0), then we have to 
repeat the calculations, with k replaced by k,, and components &, and k, left intact at the Lorentz 
transform. As a result, Eq. (42) is generalized as 


20 It may look like the reciprocal expression of @ via @’ is different, violating the relativity principle. However, in 
this case, we have to change the sign of f, because the relative velocity of the system is opposite, so that we return 
to Eq. (44) again. 
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o' = wy(l- Boos). (9.46) 


For the case cos@= +1, Eq. (46) reduces to our previous result (42). However, at 0= 7/2 (i.e. cos@ = 0), 
the relation is rather different: 
Transverse 


(9.47) Doppler 


effect 


This is the transverse Doppler effect — which is absent in non-relativistic physics. Its first 
experimental evidence was obtained using electron beams (as had been suggested in 1906 by J. Stark), 
by H. Ives and G. Stilwell in 1938 and 1941. Later, similar experiments were repeated several times, but 
the first unambiguous measurements were performed only in 1979 by D. Hasselkamp et al. who 
confirmed Eq. (47) with a relative accuracy of about 10%. This precision may not look too spectacular, 
but besides the special tests discussed above, the Lorentz transform formulas have been also confirmed, 
less directly, by a huge body of other experimental data, especially in high energy physics, agreeing 
with calculations incorporating this transform as their part. This is why, with due respect to the spirit of 
challenging authority, I should warn the reader: if you decide to challenge the relativity theory (called 
“theory” by tradition only), you would also need to explain all these data. Best luck with that! 2! 


9.3. 4-vectors, momentum, mass, and energy 


Before proceeding to the relativistic dynamics, let us discuss the mathematical formalism that 
makes all calculations more compact — and more beautiful. We have already seen that the three spatial 
coordinates {x, y, z} and the product ct are Lorentz-transformed similarly — see Eqs. (18)-(19) again. So 
it is natural to consider them as components of a single four-component vector (or, for short, 4-vector), 


(AAs ay = {ct,r} : (9.48) 
with components Space 


(9.49) time 


4-vector 


Lorentz 


(9.50) transform: 


4-form 


Lorentz 
(9.5 1) transform 


matrix 


Since such 4-vectors are a new notion for this course and will be used for many more purposes 
than just the space-time transform, we need to discuss the general mathematical rules they obey. Indeed, 


21 The same fact, ignored by crackpots, is also valid for other favorite directions of their attacks, including the 
Universe expansion, quantum measurement uncertainty, and entropy growth in physics, and the evolution theory 
in biology. 
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as was already mentioned in Sec. 8.9, the usual (three-component) vector is not just any ordered set 
(string) of three scalars {A,, A,, A-}; if we want it to represent a reference-frame-independent physical 
reality, the vector’s components have to obey certain rules at the transfer from one reference frame to 
another. In particular, in the non-relativistic limit the vector’s norm (its magnitude squared), 


A= A+ Al + Al, (9.52) 


should be invariant with respect to the transfer between different reference frames. However, a naive 
extension of this approach to 4-vectors would not work, because, according to the calculations of Sec. 1, 
the Lorentz transform keeps intact the combinations of the type (7), with one sign negative, rather than 
the sum of all components squared. Hence for the 4-vectors, all the rules of the game have to be 
reviewed and adjusted — or rather redefined from the very beginning, for example as follows.?? 


An arbitrary 4-vector is a string of 4 scalars,?3 
(4), 4,,4,, A; }, (9.53) 


whose components A;, as measured in the reference frames 0 and 0’ shown in Fig. 1, obey the Lorentz 
transform relations similar to Eq. (50): 


(9.54) 


(9.55) 


This is the so-called Lorentz invariance condition for the 4-vector’s norm. (The difference 
between this relation and Eq. (52), pertaining to the Euclidian geometry, is the reason why the 
Minkowski space is called pseudo-Euclidian.) It is also straightforward to use Eqs. (51) and (54) to 
check that the evident generalization of the norm, the scalar product of two arbitrary 4-vectors, 


(9.56) 
is also Lorentz-invariant. 


Now consider the 4-vector corresponding to a small interval between two close world events: 


{dx,,dx,,dx,,dx,} = {cdt,dr}; (9.57) 


its norm, 


(ds)? =dx; Sav? =c’ (dt) —(dr)’, (9.58) 


j=l 


22 The most prominent alternative, which has both advantages and drawbacks, is to use 4-vectors with one 
imaginary component — for example, the imaginary time ict instead of the real product ct in Eq. (48). 

23 Such vectors are said to reside in so-called 4D Minkowski spaces — called after Hermann Minkowski who was 
the first one to recast (in 1907) the special relativity relations in a form in which the spatial coordinates and time 
(or rather ct) are treated on an equal footing. 
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is of course also Lorentz-invariant. Since the speed of any particle (or signal) cannot be larger than c, for 
any pair of world events that are in a causal relation with each other, (dr)” cannot be larger than (cdt)’, 
i.e. such time-like interval (ds)’ cannot be negative. The 4D surface separating such intervals from 
space-like intervals (ds)’ < 0 is called the light cone (Fig. 9). 


time-like interval ds*> 0 t 
(causal relation possible) 


space-like interval ds* < 0 
(causal relation impossible) 


r=ct Fig. 9.9. A 2+1 dimensional image of 
the light cone — which is actually 3+1 
dimensional. 


Now let us consider two close world events that happen with the same point moving with 
velocity u. Then in the frame moving with the point (v = u), the last term on the right-hand side of Eq. 
(58) equals zero, while the involved time is the proper one, so that 


as =Cdt . (9.59) 
where dr is the proper time interval. But according to Eq. (21), this means that we can write 
t 
dt = a : (9.60) 
¥ 
where dt is the time interval in an arbitrary (besides being inertial) reference frame, while 
u 1 1 
=— and y= = 9.61 
P c ‘ (1-2?) (t-u?/e?}” ee 


are the parameters (17) corresponding to the point’s velocity (u) in that frame, so that ds = cdt/y.24 


Let us use Eq. (60) to explore whether a 4-vector may be formed using the spatial Cartesian 
components of the point’s velocity 
u= ‘= ay at (9.62) 


dt’ dt’ dt 


Here we have some problem: as Eqs. (22) show, these components do not obey the Lorentz transform. 
However, let us use dt= dt/y the proper time interval of the point, to form the following string: 


ky ai, dx, ax, dx dy a 
= : : : = ; ; 9.63 4-velocity 
NO re Heu) (9.63) 


dt’ dt’ dt? dt 


24] have opted against using special indices (e.g., B,, and y,) to distinguish Eqs. (17) and (61) here and below, in a 
hope that the suitable velocity (of either a reference frame or a particle) will be always clear from the context. 
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As it follows from the comparison of the middle form of this expression with Eq. (48), since the time- 
space vector obeys the Lorentz transform, and 7 is Lorentz-invariant, the string (63) is a legitimate 4- 
vector; it is called the 4-velocity of a point — or of a point particle. 


Now we are well equipped to proceed to relativistic dynamics. Let us start with such basic 
notions as the momentum p and the energy &-— so far, for a free particle.2> Perhaps the most elegant way 
to “derive” (or rather guess*®°) the expressions for p and & as functions of the particle’s velocity u, is 
based on analytical mechanics. Due to the conservation of v, the trajectory of a free particle in the 4D 
Minkowski space {ct, r} is always a straight line. Hence, from the Hamilton principle,?”? we may expect 
its action ‘Y, between points | and 2, to be a linear function of the space-time interval (59): 


* * ” dt 
¥ =a\ds=ac{dr=ac|—, (9.64) 
1 1 i, 


where @ is some constant. On the other hand, in analytical mechanics, the action is defined as 


S =| Lat, (9.65) 


2 1/2 
paz ce | ; (9.66) 
y e 
In the non-relativistic limit (u << c), this function tends to 
2 2 
£ = acl\1-—~ |=ac-—. (9.67) 
c 2c 


In order to correspond to the Newtonian mechanics,’ the last (velocity-dependent) term should equal 
mu’/2. From here we find a@=~—mc, so that, finally, 


(9.68) 


Now we can find the Cartesian components p; of the particle’s momentum as the generalized 
momenta corresponding to the corresponding components 7; (j = 1, 2, 3) of the 3D radius-vector r:3° 


25 | am sorry for using, just as in Sec. 6.3, the same traditional notation (p) for the particle’s momentum as had 
been used earlier for the electric dipole moment. However, since the latter notion will be virtually unused in the 
balance of this course, this may hardly lead to confusion. 

26 Indeed, such a derivation uses additional assumptions, however natural (such as the Lorentz-invariance of Y), 
i.e. it can hardly be considered as a real proof of the final results, so that they require experimental confirmation. 
Fortunately, such confirmations have been numerous — see below. 

27 See, e.g., CM Sec. 10.3. 

28 See, e.g., CM Sec. 2.1. 

29 See, e.g., CM Eq. (2.19b). 

30 See, e.g., CM Sec. 2.3, in particular Eq. (2.31). 
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2 2 2 1/2 

SF mu. 

jp oe ie ee !_emyu,, (9.69) 
ar, Ou, au, (1-u? /c?) 


J i Cc 
Thus for the 3D vector of momentum, we can write the result in the same form as in non-relativistic 


mechanics, 
= = Relativistic 
p=myu=Mu, (9.70) Se ae 


using the reference-frame-dependent scalar M (called the relativistic mass) defined as 


(9.71) _ Relativistic 


mass 


m being the non-relativistic mass of the particle. (More often, m is called the rest mass, because in the 
reference frame in which the particle rests, Eq. (71) yields M = m.) 


Next, let us return to analytical mechanics to calculate the particle’s energy & (which for a free 
particle coincides with its Hamiltonian function #%:3! 


2 mu ur - mc 
E=H=) pu,-L=p-u-L= me = . (9.72) 
2 =? (1-u?/e?}” c (au e*) 


Thus, we have arrived at the most famous of Einstein’s formulas — and probably of physics as a whole: 


which expresses the relation between the free particle’s mass and its energy.” In the non-relativistic 


limit, it reduces to 


2 
MC 


2 2 
l-u’/c c 


the first term mc’ being called the rest energy of a particle. 


Now let us consider the following string of 4 scalars: 


4-vector of 


é 
{<.ipaps| = : (9.75) — energy- 
c 


momentum 


Using Eqs. (70) and (73) to represent this expression as 


i o| =my{c,u}, (9.76) 


Cc 


31 See, e.g., CM Eq. (2.32). 

32 Let me hope that the reader understands that all the layman talk about the “mass to energy conversion” is only 
valid in a very limited sense of the word. While the Einstein relation (73) does allow the conversion of “massive” 
particles (with m # 0) into particles with m = 0, such as photons, each of the latter particles also has a non-zero 
relativistic mass M, and simultaneously the energy € related to this M by Eq. (73). 
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and comparing the result with Eq. (63), we immediately see that, since m is a Lorentz-invariant constant, 
this string is a legitimate 4-vector of energy-momentum. As a result, its norm, 


© 2 
[<) —p, (9.77a) 


is Lorentz-invariant, and in particular, has to be equal to the norm in the particle-bound frame. But in 
that frame, p =0, and according to Eq. (73), &= mc’, and the norm is just 


(2) _ (me) (ney, (9.77b) 


so that in an arbitrary frame 
2 
4) — p> =(mc)’. (9.78a) 
c 


This very important relation} between the relativistic energy and momentum (valid for free particles 
only!) is usually represented in the form 


(9.78b) 


According to Eq. (70), in the so-called ultra-relativistic limit u + c, p tends to infinity, while 
mc’ stays constant so that pc/mc* > 0. As follows from Eq. (78), in this limit &~ pc. Though the above 
discussion was for particles with finite m, the 4-vector formalism allows us to consider compact objects 
with zero rest mass as ultra-relativistic particles for which the above energy-to-moment relation, 


é = pe, form=0, (9.79) 


is exact. Quantum electrodynamics*> tells us that under certain conditions, the electromagnetic field 
quanta (photons) may be also considered as such massless particles with momentum p = “ik. Plugging 
(the modulus of) the last relation into Eq. (78), for the photon’s energy we get & = pc = hkc = ho. Please 
note again that according to Eq. (73), the relativistic mass of a photon is not equal to zero: M = é/c’ = 


ha/c’, so that the term “massless particle” has a limited meaning: m = 0. For example, the relativistic 
mass of an optical phonon is of the order of 10°° kg. On the human scale, this is not too much, but still a 
noticeable (approximately one-millionth) part of the rest mass me of an electron. 


The fundamental relations (70) and (73) have been repeatedly verified in numerous particle 
collision experiments, in which the total energy and momentum of a system of particles are conserved — 
at the same conditions as in non-relativistic dynamics. (For the momentum, this is the absence of 
external forces, and for the energy, the elasticity of particle interactions — in other words, the absence of 
alternative channels of energy escape.) Of course, generally only the total energy of the system is 
conserved, including the potential energy of particle interactions. However, at typical high-energy 


33 Please note one more simple and useful relation following from Eqs. (70) and (73): p = (é/c’)u. 

34 It may be tempting to interpret this relation as the perpendicular-vector-like addition of the rest energy mc’ and 
the “kinetic energy” pc, but from the point of view of the total energy conservation (see below), a better definition 
of the kinetic energy is 7(u) = &(u) — € (0). 

35 It is briefly reviewed in QM Chapter 9. 
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particle collisions, the potential energy vanishes so rapidly with the distance between them that we can 
use the momentum and energy conservation laws using Eq. (73). 


As an example, let us calculate the minimum energy émin of a proton (pz), necessary for the well- 
known high-energy reaction that generates a new proton-antiproton pair, pp + pp > p +pt+ptp, 
provided that before the collision, proton p, had been at rest in the lab frame. This minimum 
corresponds to the vanishing relative velocity of the reaction products, i.e. their motion with virtually 
the same velocity (Ugin), as seen from the lab frame — see Fig. 10. 


lab frame c.o.m. frame 


p, Umin —p | a tia _ Fig. 9.10. A high-energy proton 
° Y reaction at & = & min — schematically. 


Due to the momentum conservation, this velocity should have the same direction as the initial 
velocity (Umin) of proton pz. This is why two scalar equations: for energy conservation, 


2 2 
mc , 4mc 
+mc” = ——————__.,, (9.80a) 
(aso?) (1-u2,/c?)” 
and for momentum conservation, 
4 
is = fn (9.80b) 


(i=12,. Je?) ee (1-2, 1c?) 


are sufficient to find both umin and ug. After a rather tedious solution of this system of two nonlinear 


equations, we get 
= 15.20.9900 Wiss 8, = 0.866c. (9.81) 


min 


Finally, we can use Eq. (72) to calculate the required energy; the result is &nin= 7 mc’. (Note that at this 


threshold, only a minor 2mc* part of the kinetic energy Tinin = Gmin — mc = 6mc’ of the initially moving 


particle, goes into the “useful” proton-antiproton pair production.) The proton’s rest mass, mp = 1.67x10" 
hy kg, corresponds to MyC ~ 1.502x107° J x 0.938 GeV, so that @pin® 6.57 GeV. 


The second, more intelligent way to solve the same problem is to use the center-of-mass (c.0.m.) 
reference frame that, in relativity, is defined as the frame in which the total momentum of the system 
vanishes.>¢ In this frame, at € = Enin, the velocity and momenta of all reaction products are vanishing, 
while the velocities of the protons p, and p, before the collision are equal and opposite, with an initially 
unknown magnitude wu’. Hence the energy conservation law becomes 


= Amc’ , (9.82) 
(a ley 


36 Note that according to this definition, the c.o.m.’s radius-vector is R = L;Mar/LiM; = Linn Line i.e. is 


generally different from the well-known non-relativistic expression R = Dyn, oym,. 
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readily giving u’/c = V3/2. (This is of course the same result as Eq. (81) gives for usin.) Now we can use 
the fact that the velocity of the proton p, in the c.o.m. frame is (—u’), to find its lab-frame speed, using 
the velocity transform (25): 

2u' 


=—— 9.83 
l4u?/e? ( ) 


Uu min 


With the above result for uw’, this relation gives the same result as the first method, umin/c = 43/7, but in 
a simpler way. 


9.4. More on 4-vectors and 4-tensors 


This is a good moment to introduce a formalism that will allow us, in particular, to solve the 
same proton collision problem in one more (and arguably, the most elegant) way. Much more 
importantly, this formalism will be virtually necessary for the description of the Lorentz transform of the 
electromagnetic field, and its interaction with relativistic particles — otherwise the formulas would be too 
cumbersome. 


Let us call the 4-vectors we have used before, 
A* ={A,, A}, (9.84) 
contravariant, and denote them with top indices, and introduce also covariant vectors, 
A, ={4, Al}, (9.85) 


marked by bottom indices. Now if we form a scalar product of these two vectors using the standard 
(3D-like) rule, just as a sum of the products of the corresponding components, we immediately get 


AMSA =e. (9.86) 


Here and below the sign of the sum of four components of the product has been dropped.” The scalar 
product (86) is just the norm of the 4-vector in our former definition, and as we already know, is 
Lorentz-invariant. Moreover, the scalar product of two different vectors (also a Lorentz invariant), may 
be rewritten in any of two similar forms:38 


A,B, -A:B=A,B* =A°B_,; (9.87) 
again, the only caveat is to take one vector in the covariant, and the other one in the contravariant form. 


Now let us return to our sample problem (Fig. 10). Since all components (¢/c and p) of the total 
4-momentum of our system are conserved at the collision, its norm is conserved as well: 


(p, + Py ).(P. + Py)” =(42),(4D)*- (9.88) 


37 This compact notation may take some time to be accustomed to, but is very convenient (compact) and can 
hardly lead to any confusion, due to the following rule: the summation is implied when (and only when) the same 
index is repeated twice, once on the top and another at the bottom. In this course, this shorthand notation will be 
used only for 4-vectors, but not for the usual (3D spatial) vectors. 

38 Note also that, by definition, for any two 4-vectors, 4,B° = B%A q. 
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Since now the vector product is the usual math construct, we know that the parentheses on the left-hand 
side of this equation may be multiplied as usual. We may also swap the operands and move constant 
factors through products as convenient. As a result, we get 


(7. )a(Pa)” +(Ps)a(s)* +2(P.),(Ps)* =16 pop". (9.89) 


Thanks to the Lorentz invariance of each of the terms, we may calculate it in the reference frame 
we like. For the first two terms on the left-hand side, as well as for the right-hand side term, it is 
beneficial to use the frames in which that particular proton is at rest; as a result, according to Eq. (77b), 
each of the two left-hand-side terms equals (mc)*, while the right-hand side equals 16(mc)*. On the 
contrary, the last term on the left-hand side is more easily evaluated in the lab frame, because in it, the 
three spatial components of the 4-momentum py, vanish, and the scalar product is just the product of the 
scalars é/c for protons a and b. For the latter proton, being at rest, this ratio is just mc so that we get a 
simple equation, 


(mc)* +(mc)? +2 mc =16(mc)’ , (9.90) 
Cc 


immediately giving the final result &min= 7 mc’, already obtained earlier in two more complex ways. 


Let me hope that this example was a convincing demonstration of the convenience of 
representing 4-vectors in the contravariant (84) and covariant (85) forms,*? with Lorentz-invariant 
norms (86). To be useful for more complex tasks, this formalism should be developed a little bit further. 
In particular, it is crucial to know how the 4-vectors change under the Lorentz transform. For 
contravariant vectors, we already know the answer (54); let us rewrite it in our new notation: 


At = Lt At (9.91) 


where L’, is the matrix (51), generally called the mixed Lorentz tensor:*° 


(9.92) 


Note that though the position of the indices a and fin the Lorentz tensor notation is not crucial, because 
this tensor is symmetric, it is convenient to place them using the general index balance rule: the 
difference of the numbers of the upper and lower indices should be the same in both parts of any 4- 
vector/tensor equality. (You may check that all the formulas above do satisfy this rule.) 


39 These forms are 4-vector extensions of the notions of contravariance and covariance, introduced in the 1850s 
by J. Sylvester (who also introduced the term “matrix” in its mathematical sense) for the description of the change 
of the usual 3-component spatial vectors at the transfer between different reference frames — e.g., resulting from 
the frame rotation. In this case, the contravariance or covariance of a vector is uniquely determined by its nature: 
if the Cartesian coordinates of a vector (such as the non-relativistic velocity v = dr/dt) are transformed similarly to 
the radius-vector r, it is called contravariant, while the vectors (such as Vf’) that require the reciprocal transform, 
are called covariant. In the 4D Minkowski space, both forms may be used for any 4-vector. 

40 Just as the 4-vectors, 4-tensors with two top indices are called contravariant, and those with two bottom indices, 
covariant. The tensors with one top and one bottom index are called mixed. 
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In order to rewrite Eq. (91) in a more general form that would not depend on the particular 
orientation of the coordinate axes (Fig. 1), let us use the contravariant and covariant forms of the 4- 
vector of the time-space interval (57), 


dx* ={cdt,dr}, dx, = {cdt,-dr}; (9.93) 
then its norm (58) may be represented as*! 
(ds)* =(cdt)* —(dr)* = dx* dx, = dx,dx*. (9.94) 
Applying Eq. (91) to the first, contravariant form of the 4-vector (93), we get 
de Sian (9.95) 


But with our new shorthand notation, we can also write the usual rule of differentiation of each 
component x“, considering it a function (in our case, linear) of four arguments x’, as follows:42 


ox® 
dx* = ae 9.96 
ay (9.96) 
Comparing Eqs. (95) and (96), we can rewrite the general Lorentz transform rule (92) in a new form, 
Gx? 
A® = A’. 9.97a 
ay (9.97a) 


which does not depend on the coordinate axes’ orientation. 
It is straightforward to verify that the reciprocal transform may be represented as 


je "4p (9.97) 


AY = ; 
Ox? 


However, the reciprocal transform has to differ from the direct one only by the sign of the relative 
velocity of the frames, so that for the coordinate choice shown in Fig. 1, its matrix is 


41 Another way to write this relation is (ds) = gagdx°dx" = gd dx,z, where double summation over indices a 
and fis implied, and g is the so-called metric tensor, 


10 0 0 
So 
¥~\0 0 -1 0 

0 0 0 -1 


which may be used, in particular, to transfer a covariant vector into the corresponding contravariant one 
and back: A* = g"4 B 4a= Zap’ . The metric tensor plays a key role in general relativity, in which it is 
affected by gravity — “curved” by particles’ masses. 

42 Note that in the index balance rule, the top index in the denominator of a fraction is counted as a bottom index 
in the numerator, and vice versa. 
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a 
ox” ic By Y 
ox? 0 0 
0 0 
Since according to Eqs. (84)-(85), covariant 4-vectors differ from the contravariant ones by the sign of 


their spatial components, their direct transform is given by matrix (98). Hence their direct and reciprocal 
transforms may be represented, respectively, as 


(9.98) 


or Oo oO 
re CO CO SO 


(9.99) 


evidently satisfying the index balance rule. (Note that primed quantities are now multiplied, rather than 
divided as in the contravariant case.) As a sanity check, let us apply this formalism to the scalar product 
A,A®. As Eq. (96) shows, the implicit-sum notation allows us to multiply and divide any equality by the 
same partial differential of a coordinate, so that we can write: 
a ax"? ox" 4 f ax"? 4 4 Uy 4 "7 
A,A® =— Aig = AGA” = By Aig A! = AA, (9.100) 


i.e. the scalar product A,A® (as well as A“4,) is Lorentz-invariant, as it should be. 


Now, let us consider the 4-vectors of derivatives. Here we should be very careful. Consider, for 
example, the following 4-vector operator 
e “| 2 vj, (9.101) 


ox® O(ct)’ 


As was discussed above, the operator is not changed by its multiplication and division by another 
differential, e.g., dx’ (with the corresponding implied summation over all four values of ), so that 


eM: Ae 


Cx” ~ Ox? Ax'? ne) 


But, according to the first of Eqs. (99), this is exactly how the covariant vectors are Lorentz- 
transformed! Hence, we have to consider the derivative over a contravariant space-time interval as a 
covariant 4-vector, and vice versa.*3 (This result might be also expected from the index balance rule.) In 
particular, this means that the scalar product 


O st = OA, “ 
Ox® O(ct) 


V-A (9.103) 


should be Lorentz-invariant for any legitimate 4-vector. A convenient shorthand for the covariant 
derivative, which complies with the index balance rule, is 


a =0,> (9.104) 
Ox 


43 As was mentioned above, this is also a property of the reference-frame transform of the “usual” 3D vectors. 
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so that the invariant scalar product may be written just as 0,A®. A similar definition of the contravariant 


derivative, 
Oc = e = -V>, (9.105) 
Ox, O(ct) 
allows us to write the Lorentz-invariant scalar product (103) in any of the following two forms: 
a +V-A=0°A, =0,A°. (9.106) 
O(ct) 


Finally, let us see how the general Lorentz transform changes 4-tensors. A second-rank 4x4 
matrix is a legitimate 4-tensor if the 4-vectors it relates obey the Lorentz transform. For example, if two 
legitimate 4-vectors are related as 

A“ =T"B,, (9.107) 
we should require that 
A’ a8 xs (9.108) 


where A“ and A“ are related by Eqs. (97), while Bz and B’,, by Eqs. (99). This requirement immediately 
yields 
ox’? Ox? re 


i ie ; (9.109) 


Ox” Ox? 


with the implied summation over two indices, yand 6. The rules for the covariant and mixed tensors are 
similar.*4 


9.5. Maxwell equations in the 4-form 


This 4-vector formalism background is sufficient to analyze the Lorentz transform of the 
electromagnetic field. Just to warm up, let us consider the continuity equation (4.5), 


0p : 
—+4+V-j=0, 9.110 
ay j ( ) 


which expresses the electric charge conservation, and as we already know, is compatible with the 
Maxwell equations. If we now define the contravariant and covariant 4-vectors of electric current as 


i ={pcj}, i, = loci} (9.111) 
then Eq. (110) may be represented in the form 


showing that the continuity equation is form-invariant*> with respect to the Lorentz transform. 


44 It is straightforward to check that transfer between the contravariant and covariant forms of the same tensor 
may be readily achieved using the metric tensor g: Tag = Sal” gsp, TY = BT yg”. 

45 In some texts, the formulas preserving their form at a transform are called “covariant”, creating a possibility for 
confusion with the covariant vectors and tensors. On the other hand, calling such formulas “invariant” would not 
distinguish them properly from invariant quantities, such as the scalar products of 4-vectors. 
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Of course, such a form-invariance of a relation does not mean that all component values of the 4- 
vectors participating in it are the same in both frames. For example, let us have some static charge 
density o in frame 0; then Eq. (97b), applied to the contravariant form of the 4-vector (111), reads 


_ Ox" 

ax? 
Using the particular form (98) of the reciprocal Lorentz matrix for the coordinate choice shown in Fig. 
1, we see that this relation yields 


1a 


J 


i’, — with 7% ={pc,0,0,0}. (9.113) 


J’, = —vBpc = —wp, =i (9.114) 


Since the charge velocity, as observed from frame 0’, is (—v), the non-relativistic results would be p’ = 
Pp; j’ = vp. The additional y factor in the relativistic results is caused by the length contraction: dx’ = 
dx/y, so that to keep the total charge dO = pd’r = pdxdydz inside the elementary volume d°r = dxdydz 


intact, o (and hence j,) should increase proportionally. 


Next, at the end of Chapter 6 we have seen that Maxwell equations for the electromagnetic 
potentials ¢ and A may be represented in similar forms (6.118), under the Lorenz (again, not “Lorentz”, 
please!) gauge condition (6.117). For free space, this condition takes the form 


1 

vie 0, (9.115) 
c’ Ot 

This expression gives us a hint of how to form the 4-vector of electromagnetic potentials:*¢ 


(9.116) 


indeed, this vector satisfies Eq. (115) in its 4-form: 
O° A, =0,A° =0. (9.117) 


Since this scalar product is Lorentz-invariant, and the derivatives (104)-(105) are legitimate 4- 
vectors, this implies that the 4-vector (116) is also legitimate, i.e. obeys the Lorentz transform formulas 
(97), (99). Even more convincing evidence of this fact may be obtained from the Maxwell equations 
(6.118) for the potentials. In free space, they may be rewritten as 


e -v? }é- 2 = ule : a er) (9.118) 


Act)? C Ee A(ct)? 


Using the definition (116), these equations may be merged to one:47 


where (| is the d’Alembert operator,*® which may be represented as either of two scalar products, 


46 In the Gaussian units, the scalar potential should not be divided by c in this relation. 
47 Tn the Gaussian units, the coefficient so in Eq. (119) should be replaced, as usual, with 4z/c. 


Chapter 9 Page 26 of 56 


Lorentz 
transforms 
of pandj 


4-vector 
of potentials 


Lorenz 
gauge: 
4-form 


Maxwell 
equation 
for 
4-potential 


D’Alembert 
operator 


Field- 
strength 
tensors 


Essential Graduate Physics EM: Classical Electrodynamics 


aka 
O(ct)’ 


O= -V’ =0"0, =8,0". (9.120) 
and hence is Lorentz-invariant. Because of that, and the fact that the Lorentz transform changes both 4- 
vectors A“ and j® in a similar way, Eq. (119) does not depend on the reference frame choice. Thus we 
have arrived at a key point of this chapter: we see that the Maxwell equations are indeed form-invariant 


with respect to the Lorentz transform. As a by-product, the 4-vector form (119) of these equations (for 
potentials) is extremely simple — and beautiful! 


However, as we have seen in Chapter 7, for many applications the Maxwell equations for the 
field vectors are more convenient; so let us represent them in the 4-form as well. For that, we may 
express all Cartesian components of the usual (3D) field vector vectors (6.7), 


E=-v9-<, B=VxA, (9.121) 

via those of the potential 4-vector A®. For example, 

A A 
= OO C es er =-<(a°4'—a' 4"), (9.122) 
Ox Ot Oxc (ct) 

4. OA, 
| os aes (07.43 - a3’). (9.123) 

Cy oz 


Completing similar calculations for other field components (or just generating them by appropriate 
index shifts), we find that the following antisymmetric, contravariant field-strength tensor, 


FY =0° A" =6" A’, (9.124) 


may be expressed via the field components as follows:4? 


0 -—E le = Bie 


oe 9.125 
E, /c Ceo) 
E,/c 
so that the covariant form of the tensor is 
0 Eiie £,1e@ Bie 
. —-E /e 0 
L gael Bg . (9.125b) 


=E Ie 2B, —B, 
Bie = B, 


48 Named after Jean-Baptiste le Rond d’Alembert (1717-1783), who has made several pioneering contributions to 
the general theory of waves — see, e.g., CM Chapter 6. (Some older textbooks use notation (” for this operator.) 
49 

In Gaussian units, this formula, as well as Eq. (131) for G”, do not have the factor c in all the denominators. 
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If Eq. (124) looks a bit too bulky, please note that as a reward, the pair of inhomogeneous 
Maxwell equations, i.e. two equations of the system (6.99), which in free space (D = &E, B = oH) may 
be rewritten as 


E 
7 = Fei. 9.126 
A(ct) c Mod ( ) 


E 
V-—=m,¢/, VxB 
c 
may now be expressed in a very simple (and manifestly form-invariant) way, 


O,F* = wij”, (9.127) 


which is comparable with Eq. (119) in its simplicity — and beauty. Somewhat counter-intuitively, the 
pair of homogeneous Maxwell equations of the system (6.99), 


VxE+<=0, V-B=0, (9.128) 
look, in the 4-vector notation, a bit more complicated:5° 


OF, 40 F408, 20. (9.129) 


Note, however, that Eqs. (128) may be also represented in a much simpler 4-form, 


6,G”% =0, (9.130) 
using the so-called dual tensor 
0 B. B, B, 
Ge = —B, 0 =E1¢ EJe (9.131) 
=8, #,ic 0 —E,/c|’ 


=Bye =H fe £/¢ 0 
which may be obtained from F”’, given by Eq. (125a), by the following replacements: 


2 5c Ps=, (9.132) 
G G 
Besides the proof of the form-invariance of the Maxwell equations with respect to the Lorentz 
transform, the 4-vector formalism allows us to achieve our initial goal: to find out how the electric and 
magnetic field components change at the transfer between two (inertial!) reference frames. For that, let 
us apply to the tensor F* the reciprocal Lorentz transform described by the second of Eqs. (109). 
Generally, it gives, for each field component, a sum of 16 terms, but since (for our choice of 
coordinates, shown in Fig. 1) there are many zeros in the Lorentz transform matrix, and the diagonal 
components of F” equal zero as well, the calculations are rather doable. Let us calculate, for example, 
E’,=-cF*', The only non-zero terms on the right-hand side are 


ex!? ox"! ox’? ex"! E 
E' =-cF'! = F" PY \e-er |p" -1|- =F 9.133 
rove | ox! Ox° . Ox? Qx' es (6 ) c * ( ) 


°° To be fair, note that just as Eq. (127), Eq. (129) this is also a set of four scalar equations — in the latter case with 
the indices a, f, and y taking any three different values of the set {0, 1, 2, 3}. 
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Repeating the calculation for the other five components of the fields, we get very important relations 


EB =2,, B',.=B,, 
B= 7(E, -vB.), B', = (B, +vE,/c?), (9.134) 
B= {E, +vB, } B',= /(B. -vE, /c?), 


whose more compact “semi-vector” form is 


= E \? B' — B (? 


(9.135) 


E’, =)(E+vxB),,  B’, = 7(B-vxE/c’),, 


where the indices || and , stand, respectively, for the field components parallel and normal to the 
relative velocity v of the two reference frames. In the non-relativistic limit, the Lorentz factor y tends to 
1, and Eqs. (135) acquire an even simpler form 


E’> E+vxB, B! > B-—vxE. (9.136) 
c 

Thus we see that the electric and magnetic fields are transformed to each other even in the first 
order of the v/c ratio. For example, if we fly across the field lines of a uniform, static, purely electric 
field E (e.g., the one in a plane capacitor) we will see not only the electric field’s renormalization (in the 
second order of the v/c ratio), but also a non-zero de magnetic field B’ perpendicular to both the vector 
E and the vector v, i.e. to the direction of our motion. This is of course what might be expected from the 
relativity principle: from the point of view of the moving observer (which is as legitimate as that of a 
stationary observer), the surface charges of the capacitor’s plates, that create the field E, move back 
creating the dc currents (114), which induce the magnetic field B’. Similarly, motion across a magnetic 

field creates, from the point of view of the moving observer, an electric field. 


This fact is very important conceptually. One may say there is no such thing in Mother Nature as 
an electric field (or a magnetic field) all by itself. Not only can the electric field induce the magnetic 
field (and vice versa) in dynamics, but even in an apparently static configuration, what exactly we 
measure depends on our speed relative to the field sources — justifying once again the term 
electromagnetism for the field of physics we are studying in this course. 


Another simple but very important application of Eqs. (134)-(135) is the calculation of the fields 
created by a charged particle moving in free space by inertia, i.e. along a straight line with constant 
velocity u, at the impact parameter>! (the closest distance) b from the observer. Selecting the reference 
frame 0’ to move with the particle in its origin, and the reference frame 0 to reside in the “lab” in which 
the fields E and B are measured, we can use the above formulas with v = u. In this case, the fields E’ 
and B’ may be calculated from, respectively, electro- and magnetostatics: 


E'- q Yr 
4 13? 
ME, r 


B’=0, (9.137) 


because in frame 0’, the particle does not move. Selecting the coordinate axes so that at the 
measurement point, x = 0, vy = b, z = 0 (Fig. 11a), for this point we may write x’ =—ut’, y’= b,z’=0, so 
that r’ = (u’t” + b’)', and the Cartesian components of the fields (137) are: 


5! This term is very popular in the theory of particle scattering — see, e.g., CM Sec. 3.7. 
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qd ut’ qd b 
E' = a ,  E.=0, 
"Amey (21? +B?) 7 Ane, (u?t? +B?) (9.138) 
B', = B', =B', =0 
(b) 


Fig. 9.11. The field pulses 
induced by a uniformly 
=F il 0 1 2 3. moving charge. 


mt |b 


Now using the last of Eqs. (19b) with x = 0, giving t’ = 7, and the relations reciprocal to Eqs. 
(134) for the field transform (they are similar to the direct transform but with v replaced with —v = -w), 
in the lab frame we get 


q uyt q yb 
E_=E' = ; EE, =yE',= : E,=0, (9.139 
~ “Ane, 27? +b?) ee 4né, (uy? +b)” ( ) 
Mm uq wb u 
B.=0, B,=0, B,=—E',= = 8. 9.140 


These results,52 plotted in Fig. 11b in the units of 7q7/4eb”, reveal two major effects. First, the 
charge passage by the observer generates not only an electric field pulse but also a magnetic field pulse. 
This is natural, because, as was repeatedly discussed in Chapter 5, any charge motion is essentially an 
electric current.>? Second, Eqs. (139)-(140) show that the pulse duration scale is 


i; 7 2 1/2 
w= 2 021-4) (9.141) 
mM u 


i.e. shrinks to virtually zero as the charge’s velocity u approaches the speed of light. This is of course a 
direct corollary of the relativistic length contraction. Indeed, in the frame 0’ moving with the charge, the 
longitudinal spread of its electric field at distance b from the motion line is of the order of Ax’ = b. 
When observed from the lab frame 0, this interval, in accordance with Eq. (20), shrinks to Ax = Ax ’7/y= 
b/y, and hence so does the pulse duration scale At = Ax/u = b/w. 


52 In the next chapter, we will re-derive them in a different way. 

53 It is straightforward to use Eqs. (140) and the linear superposition principle to calculate, for example, the 
magnetic field of a string of charges moving along the same line and separated by equal distances Ax = a (so that 
the average current, as measured in frame 0, is qu/a), and to show that the time-average of the magnetic field is 
given by the familiar Eq. (5.20) of magnetostatics, with b instead of p. 
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9.6. Relativistic particles in electric and magnetic fields 


Now let us analyze the dynamics of charged particles in electric and magnetic fields. Inspired by 
“our” success in forming the 4-vector (75) of energy-momentum, with the contravariant form 


&é a 
p° = {=p} = rinesp}= mS =mu°, (9.142) 
c dt 
where uw“ is the contravariant form of the 4-velocity (63) of the particle, 
. d. 
Ga. ee, (9.143) 
dt dt 


we may notice that the non-relativistic equation of motion, resulting from the Lorentz-force formula 
(5.10) for the three spatial components of p“, for a charged particle’s motion in an electromagnetic field, 


Particle’s dp 
tion of = 
aie (E+ ux B), (9.144) 


is fully consistent with the following 4-vector equality (which is evidently form-invariant with respect to 


the Lorentz transform): 
Procate ap” gp %y (9.145) 
4-form dt A - 


For example, according to Eq. (125), the @ = 1 component of this equation reads 


1 


dp 


E 
a qF Pu, = = ye +0-(-,)+(-B, \(-u,) +B, cm.) =qy[E+uxB],, (9.146) 


and similarly for two other spatial components (@ = 2 and @ = 3). It may look that these expressions 
differ from the 2" Newton law (144) by an extra factor of y. However, plugging into Eq. (146) the 
definition of the proper time interval, dt = dt/y, and canceling y in both parts, we recover Eq. (144) 
exactly — for any velocity of the particle! The only caveat is that if u is comparable with c, the vector p 
in Eq. (144) has to be understood as the relativistic momentum (70), proportional to the velocity- 
dependent mass M = ym = m rather than to the rest mass m. 


The only remaining general task is to examine the meaning of the 0" component of Eq. (145). 
Let us spell it out: 


P= GF u, =4f0-70(- A: Jem.) [== Jem, )e(- “Jem, =qy—-u. (0.147) 


Cc Cc Cc 


Recalling that Pp = é/c, and using the basic relation drt = dt/y again, we see that Eq. (147) looks exactly 
like the non-relativistic relation for the kinetic energy change (what is sometimes called the work-energy 
principle, in our case for the Lorentz force only**): 


54 See, e.g., CM Eq. (1.20) divided by dt, and with dp/dt = F = gE. (As a reminder, the magnetic field cannot 
affect the particle’s energy, because the magnetic component of the Lorentz force is perpendicular to its velocity.) 
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(9.148) 


besides that in the relativistic case, the energy has to be taken in the general form (73). 


Without question, the 4-component equation (145) of the relativistic dynamics is absolutely 
beautiful in its simplicity. However, for the solution of particular problems, Eqs. (144) and (148) are 
frequently more convenient. As an illustration of this point, let us now use these equations to explore 
relativistic effects at charged particle motion in uniform, time-independent electric and magnetic fields. 
In doing that, we will, for the time being, neglect the contributions into the field by the particle itself.>> 


(i) Uniform magnetic field. Let the magnetic field be constant and uniform in the “lab” reference 
frame 0 that is used for measurements. Then in this frame, Eqs. (144) and (148) yield 


dp dé 
re -0. 9.149 
a dt ne 


From the second equation, ¢ = const, we get u = const, f= u/c = const, y= (1 — &)'” = const, and M= 
vm = const, so that the first of Eqs. (149) may be rewritten as 


aN aes (9.150) 
dt 


where @, is the vector directed along the magnetic field B, with the magnitude equal to the following 
cyclotron frequency (sometimes called “gyrofrequency’’): 


(9.151) 


If the particle’s initial velocity uo is perpendicular to the magnetic field, Eq. (150) describes its 
circular motion, with a constant speed u = uo, in a plane normal to B, with the angular velocity (151). In 
the non-relativistic limit u << c, when y— 1, i.e. M— m, the cyclotron frequency @ equals gB/m, i.e. is 
independent of the speed. However, as the kinetic energy of the particle is increased to become 
comparable with its rest energy mc’, the frequency decreases, and in the ultra-relativistic limit, 

oO. Te for urc. (9.152) 
Dp m 

The cyclotron motion’s radius may be calculated as R = u/q@,; in the non-relativistic limit, it is 
proportional to the particle’s speed, i.e. to the square root of its kinetic energy. However, as Eq. (151) 
shows, in the general case the radius is proportional to the particle’s relativistic momentum rather than 
its speed: 


(9.153) 


so that in the ultra-relativistic limit, when p ~ é/c, R is proportional to the kinetic energy. 


55 As was emphasized earlier in this course, in statics this contribution is formally infinite and has to be ignored. 
In dynamics, this is generally not true; these se/f-action effects (which are, in most cases, negligible) will be 
discussed in the next chapter. 
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These dependencies of @, and R on energy are the major factors in the design of circular 
accelerators of charged particles. In the simplest of these machines (the cyclotron, invented in 1929 by 
Ernest Orlando Lawrence), the frequency @ of the accelerating ac electric field is constant, so that even 
if it is tuned to the @ of the initially injected particles, the drop of the cyclotron frequency with energy 
eventually violates this tuning. Due to this reason, the largest achievable particle’s speed is limited to 
just ~0.1 c (for protons, corresponding to the kinetic energy of just ~15 MeV). This problem may be 
addressed in several ways. In particular, in synchrotrons (such as Fermilab’s Tevatron and the CERN’s 
Large Hadron Collider, LHC>°) the magnetic field is gradually increased in time to compensate for the 
momentum increase (B « p), so that both R (148) and @ (147) stay constant, enabling proton 
acceleration to energies as high as ~ 7 TeV, i.e. ~2,000 me?.57 


Returning to our initial problem, if the particle’s initial velocity has a component wu) along the 
magnetic field, then it is conserved in time, so that the trajectory is a spiral around the magnetic field 
lines. As Eqs. (149) show, in this case, Eq. (150) remains valid but in Eqs. (151) and (153) the full speed 
and momentum have to be replaced with magnitudes of their (also time-conserved) components, uw, and 
P.1, normal to B, while the Lorentz factor vin those formulas still includes the full speed of the particle. 


Finally, in the special case when the particle’s initial velocity is directed exactly along the 
magnetic field’s direction, it continues to move straight along the vector B. In this case, the cyclotron 
frequency still has the non-zero value (151) but does not correspond to any real motion, because R = 0. 


(ii) Uniform electric field. This problem is (technically) more complex than the previous one 
because in the electric field, the particle’s energy changes. Directing the z-axis along the field E, from 
Eq. (144) we get 


dp dp , 
— =@qE, ——=0. 9.154 
dat" dt we 
If E does not change in time, the first integration of these equations is elementary, 
p.(t) = p.(0)+gEt, p(t) =const =p, (0), (9.155) 


but the further integration requires care because the effective mass M = ym of the particle depends on its 
full speed u, with 
uw =u? +ui, (9.156) 


making the two motions, along and across the field, mutually dependent. 
If the initial velocity is perpendicular to the field E, i.e. if p(0) = 0, p1(0) = p(0) = po, the easiest 
way to proceed is to calculate the kinetic energy first: 


€- = (me?) +e’ p’(th=& +c? (qEty, where & = [(me?)? +e?p? |”. (9.157) 


On the other hand, we can calculate the same energy by integrating Eq. (148), 


56 See https://home.cern/topics/large-hadron-collider. 

57 T am sorry I have no more time/space to discuss particle accelerator physics, and have to refer the 
interested reader to special literature, for example, either S. Lee, Accelerator Physics, o ed., World 
Scientific, 2004, or E. Wilson, An Introduction to Particle Accelerators, Oxford U. Press, 2001. 
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dé dz 
—=qE-u=qE—, 9.158 
dt dt en) 
over time, with a simple result: 
é€ =, + qEz(0), (9.159) 


where (just for the notation simplicity) I took z(0) = 0. Requiring Eq. (159) to give the same & 7 as Eq. 
(157), we get a quadratic equation for the function z(t), 


& +07 (gEty =|, +gE2), 9-160) 


whose solution (with the sign before the square root corresponding to E' > 0, i.e. to z > 0) is 
1/2 


‘ 2 
Hie epee 2f\, (9.161) 
gE & 


Now let us find the particle’s trajectory. Directing the x-axis so that the initial velocity vector 
(and hence the velocity vector at any further instant) is within the [x, z] plane, i.e. that y(t) = 0 
identically, we may use Eqs. (155) to calculate the trajectory’s slope, at its arbitrary point, as 


dz _dz/dt Mu, _p, qEt 


= = ee ; (9.162) 
dx dx/dt Mu, p, py 
Now let us use Eq. (160) to express the numerator of this fraction, gEt, as a function of z: 

git =+|(6, + Ezy -62]'. (9.163) 

@ 

Plugging this expression into Eq. (161), we get 
@ - 1 (6 49m) a)”. (9.164) 

dx  CPo 


This differential equation may be readily integrated separating the variables z and x, and using the 
following substitution: € = cosh” (qEz/& +1). Selecting the origin of axis x at the initial point, so that 
x(0) = 0, we finally get the trajectory: 


z= fost GER ] (9.165) 
gE CPo 


This curve is usually called the catenary, but sometimes the “chainette”, because it (with the 
proper constant replacement) describes, in particular, the stationary shape of a heavy, uniform chain in a 
uniform gravity field directed along the z-axis. At the initial part of the trajectory, where gEx << cpo(0), 
this expression may be approximated with the first non-zero term of its Taylor expansion in small x, 


giving the following parabola: 
2 
ze al =| (9.166) 


2 CPo 


so that if the initial velocity of the particle is much lower than c (i.e. po * muo, &o ¥ mc’), we get the 
very familiar non-relativistic formula: 
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i. with a=— =—. (9.167) 


The generalization of this solution to the case of an arbitrary direction of the particle’s initial 
velocity is left for the reader’s exercise. 


(iii) Crossed uniform magnetic and electric fields (EB). In view of the somewhat bulky 


solution of the previous problem (i.e. the particular case of the current problem for B = 0), one might 
think that this problem, with B # 0, should be forbiddingly complex for an analytical solution. Counter- 
intuitively, this is not the case, due to the help from the field transform relations (135). Let us consider 
two possible cases. 


Case 1: E/c < B. Let us consider an inertial reference frame 0’ moving (relatively the “lab” 
reference frame 0 in that the fields E and B are measured) with the following velocity: 


_ExB 


v= BR” (9.168) 
and hence the speed v = c(E/c)/B < c. Selecting the coordinate axes as shown in Fig. 12, so that 
E,=0, E,=E, E,=0; B,=0, B,=0, B,=B, (9.169) 


we see that the Cartesian components of this velocity are vy, = v, v, = v, = 0. 


Fig. 9.12. Particle’s trajectory in 
crossed electric and magnetic fields 
(at E/c < B). 


Since this choice of the coordinates complies with the one used to derive Eqs. (134), we can 
readily use that simple form of the Lorentz transform to calculate the field components in the moving 
reference frame: 


E'.=0,  E', = /(E-vB)= {E 8 =0, E',=0, (9.170) 
2 
B'.=0, BY’, =0,.— BY.= {8 | 7 (1 _ 7 a a =3 <p (171) 
; : c Be Cc y 


where the Lorentz parameter y= (1 — wiey corresponds to the velocity (168) rather than that of the 
particle. These relations show that in this special reference frame, the particle only “sees” the re- 
normalized uniform magnetic field B’ < B, parallel to the initial field, i.e. normal to the velocity (168). 
Using the result of the above case (i), we see that in this frame the particle moves along either a circle or 
a spiral winding about the direction of the magnetic field, with the angular velocity (151): 
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qB' 
ae 9.172 
© EY? ( ) 
and the radius (153): 
’ P', 
R'=—. 9.173 
qB' ( ) 


Hence in the lab frame, the particle performs this orbital/spiral motion plus a “drift” with the 
constant velocity v (Fig. 12). As a result, the lab-frame trajectory of the particle (or rather its projection 
onto the plane normal to the magnetic field) is a trochoid-like curve*® that, depending on the initial 
velocity, may be either prolate (self-crossing), as in Fig. 12, or curtate (drift-stretched so much that it is 
not self-crossing). 


Such looped motion of electrons is used, in particular, in magnetrons — very popular generators 
of microwave radiation. In such a device (Fig. 13), the magnetic field, usually created by specially- 
shaped permanent magnets, is nearly uniform (in the region of electron motion) and directed along the 
magnetron’s axis (in Fig. 13, normal to the plane of the drawing), while the electric field of magnitude E 
<< cB, created by the de voltage applied between the anode and the cathode, is virtually radial. 


Copper 


Fig. 9.13. Schematic cross-section of a typical 
magnetron. (Figure adapted from 


https://en.wikipedia.org/wiki/Cavity_magnetron 


Leads tocathode —_ynder the Free GNU Documentation License.) 
& heater 


Oxide-coated 
cathode 


As a result, the above simple theory is only approximately valid, and the electron trajectories are 
close to epicycloids rather than trochoids. The applied electric field is adjusted so that these looped 
trajectories pass close to the anode’s surface, and hence to the gap openings of the cylindrical 
microwave cavities drilled in the anode’s bulk. The fundamental mode of such a cavity is quasi-lumped, 
with the cylindrical walls working mostly as inductances, and the gap openings as capacitances, with the 
microwave electric field concentrated in these openings. This is why the mode is strongly coupled to the 
electrons “licking” the anode’s surface, and their interaction creates large positive feedback (equivalent 
to negative damping), which results in intensive microwave self-oscillations at the cavities’ own 
frequency.°? The oscillation energy, of course, is taken from the dc-field-accelerated electrons; due to 
this energy loss, the looped trajectory of each electron gradually moves closer to the anode and finally 


58 As a reminder, a trochoid may be described as the trajectory of a point on a rigid disk rolled along a straight 
line. It’s canonical parametric representation is x = © + acos ©, y = asin ©. (For a > 1, the trochoid is prolate, if a 
< 1, it is curtate, and if a = 1, it is called the cycloid.) Note, however, that for our problem, the trajectory in the lab 
frame is exactly trochoidal only in the non-relativistic limit v << c (i.e. E/c << B). 

59 See, e.g., CM Sec. 5.4. 
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lands on its surface. The wide use of such generators (in particular, in microwave ovens, which operate 
in a narrow frequency band around 2.45 GHz, allocated for these devices to avoid their interference with 
wireless communication systems) is due to their simplicity and high (up to 65%) efficiency of the dc-to- 
rf energy transfer. 


Case 2: E/c > B. In this case, the speed given by Eq. (168) would be above the speed of light, so 
let us introduce a reference frame moving with a different velocity, 
ExB 
v= ; (9.174) 
(E/cy 


whose direction is the same as before (Fig. 12), and magnitude v = cxB/(E/c) is again below c. A 
calculation absolutely similar to the one performed above for Case 1, yields 


2 
E'. =0, B= (e-v0)= (1-2) ef 1-4] 2 se E', =0, (9.175) 
E c y 
B'.=0, BY’, =0, B= 8-"2)=/8-=2)-0. (9.176) 
Cc 


so that in the moving frame the particle “sees” only the electric field E’ < EF. According to the solution 
of our previous problem (11), the trajectory of the particle in the moving frame is the catenary (165), so 
that in the lab frame it has an “open”, hyperbolic character as well. 


To conclude this section, let me note that if the electric and magnetic fields are nonuniform, the 
particle motion may be much more complex, and in most cases, the integration of the system of 
equations (144) and (148) may be carried out only numerically. However, if the field’s nonuniformity is 
small, approximate analytical methods may be very effective. For example, if E = 0, and the magnetic 
field has a small transverse gradient VB in a direction normal to the vector B itself, such that 


lva| 1 
= <<_, (9.177) 
B R 
where R is the cyclotron radius (153), then it is straightforward to use Eq. (150) to show®® that the 
cyclotron orbit drifts perpendicular to both B and VB, with the drift speed 


ve eae va? |e. (9.178) 

The physics of this drift is rather simple: according to Eq. (153), the instant curvature of the 
cyclotron orbit is proportional to the local value of the field. Hence if the field is nonuniform, the 
trajectory bends slightly more on its parts passing through a stronger field, thus acquiring a shape close 
to a curate trochoid. 


For experimental physics and engineering practice, the effects of /ongitudinal gradients of 
magnetic field on the charged particle motion are much more important, but it is more convenient for me 
to postpone their discussion until we have developed a little bit more analytical tools in the next section. 


60 See, e.g., Sec. 12.4 in J. Jackson, Classical Electrodynamics, 3" ed., Wiley, 1999. 
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9.7. Analytical mechanics of charged particles 


The general Eq. (145) gives a full description of relativistic particle dynamics in electric and 
magnetic fields, just as the 2"' Newton law (1) does it in the non-relativistic limit. However, we know 
that in the latter case, the Lagrange formalism of analytical mechanics allows an easier solution of many 
problems.®! We can expect that to be true in relativistic mechanics as well, so let us expand the analysis 
of Sec. 3 (which was valid only for free particles) to particles in the field. 


For a free particle, our main result was Eq. (68), which may be rewritten as 
yt£=-me’, (9.179) 


with y= (1 —w/c’)'”, showing that the product on the left-hand side is Lorentz-invariant. How can the 
electromagnetic field affect this relation? In non-relativistic electrostatics, we could write 


L=T-U=T-@é. (9.180) 


However, in relativity, the scalar potential ¢ is just one component of the potential 4-vector (116). The 
only way to get from this full 4-vector a Lorentz-invariant contribution to y¥, which would be also 
proportional to the first power of the particle’s velocity (to account for the magnetic component of the 
Lorentz force), is evidently 
yt =-mc’ +const xu“ A,, (9.181) 
where uw” is the 4-velocity (63). To comply with Eq. (180) at u << c, the constant factor should be equal 
to (—q), so that Eq. (181) becomes 
yt =-mc? —qu’A,, (9.182) 


and with the account of Eqs. (63) and the second of Eqs. (116), we get very important equality 


(9.183) 


whose Cartesian form is 


( Wtwtey” (9.184) 
# =-mc*|1 q¢+qu,A, +u,A, +u_A,): ; 


Let us see whether this relation (which admittedly was derived by an educated guess rather than 
by a strict derivation) passes a natural sanity check. For the case of an unconstrained motion of a 
particle, we can select its three Cartesian coordinates 7; (j = 1, 2, 3) as the generalized coordinates, and 
its linear velocity components u; as the corresponding generalized velocities. In this case, the Lagrange 
equations of motion are 


ee weet (9.185) 


For example, for 7; = x, Eq. (184) yields 


OL mu OL 0g OA 
= ss +gA_=p.+qA., = +qu- : 9.186 
eu, (re q4, = P, + dA, Ox Ty q Ox ( ) 


6! See, e.g., CM Sec. 2.2 and on. 
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so that Eq. (185) takes the form 


dp. 0g OA dA 
as —+qu:—-— 2. 9.187 
dt ax ae at a 
In the equations of motion, the field values have to be taken at the instant position of the particle, 
so that the last (full) derivative has components due to both the actual field’s change (at a fixed point of 


space) and the particle’s motion. Such addition is described by the so-called convective derivative® 


6 ey. (9.188) 
dt Ot 
Spelling out both scalar products, we may group the terms remaining after cancellations as follows: 
A 0A, @A OA A 
Bgl | 208 OAs Npag | Ss uf hase | | (9.189) 
dt Ox Ot Ox Oy Oz = =Ox 


But taking into account the relations (121) between the electric and magnetic fields and potentials, this 
expression is nothing more than 


ap 
dt 
i.e. the x-component of Eq. (144). Since other Cartesian coordinates participate in Eq. (184) similarly, it 


is evident that the Lagrangian equations of motion along other coordinates yield other components of 
the same vector equation of motion. 


=q(E, +u,B, —u.B,)=q(E+uxB),, (9.190) 


So, Eq. (183) does indeed give the correct Lagrangian function, and we can use it for further 
analysis, in particular to discuss the first of Eqs. (186). This relation shows that in the electromagnetic 
field, the generalized momentum corresponding to the particle’s coordinate x 1s not py = myx, but 


ig Bee yah, (9.191) 
Gu =: =: 
Thus, as was already discussed (at that point, without proof) in Sec. 6.4, the particle’s motion in a 
magnetic field may be is described by two different linear momentum vectors: the kinetic momentum p 


defined by Eq. (70), and the canonical (or “conjugate”) momentum 


In order to facilitate discussion of this notion, let us generalize Eq. (72) for the Hamiltonian 
function W of a free particle to the case of a particle in the field: 


2 


2 
mC 


MC 


W =P uP =(p44d)-u-[- anda) =p-us +qo. (9.193) 


62 Alternatively called the “Lagrangian derivative”; for its (rather simple) derivation see, e.g., CM Sec. 8.3. 

63 With regrets, I have to use for the generalized momentum the same (very common) notation as was used earlier 
in the course for the electric polarization — which will not be discussed here and in the balance of these notes. 

64 Tn the Gaussian units, Eq. (192) has the form P= p+ gA/c. 
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Merging the first two terms of the last expression exactly as it was done in Eq. (72), we get an extremely 
simple result, 

H =ymc’+q¢, (9.194a) 
which may be spelled out as 


2 
H -|1-(2) mc’ +q@, ie. (H —qd) =(mc’)* +c? p’. (9.194b) 
mc 

These expressions may leave the reader wondering: where is the vector potential A here — and 
the magnetic field effects it has to describe? The resolution of this puzzle is easy: as we know from 
analytical mechanics,° for most applications, for example for an alternative derivation of the equations 
of motion, # has to be represented as a function of the particle’s generalized coordinates (in the case of 
unconstrained motion, these may be the Cartesian components of the vector r that serves as an argument 
for the potentials A and @), and the generalized momenta, i.e. the components of the vector P — 
generally, plus time. For that, the kinematic momentum p in Eq. (194b) has to be expressed via these 
variables. This may be done using Eq. (192), giving us the following generalization of Eq. (78):% 
Particle’s 


(H — qd) =(me*)’ +c°(P—gA)’. (9.195) Hamiltonian 


function 


It is straightforward to verify that the Hamilton equations of motion for three Cartesian 
coordinates of the particle, obtained in a regular way from this # may be merged into the same vector 
equation (144). In the non-relativistic limit, performing the expansion of Eqs. (194b) into the Taylor 
series in p”, and limiting it to two leading terms, we get the following generalization of Eq. (74): 

2 


Paneer! tab test=me a 2IP-gay oU,, wigs. 6.196) 
2m 2m 


These expressions for # and Eq. (183) for “% give a clear view of the electromagnetic field 
effects’ description in analytical mechanics. The electric part gE of the total Lorentz force can perform 
mechanical work on the particle, i.e. change its kinetic energy — see Eq. (148) and its discussion. As a 
result, the scalar potential ¢, whose gradient gives a contribution to E, may be directly associated with 
the potential energy U = q@ of the particle. On the contrary, the magnetic component guxB of the 
Lorentz force is always perpendicular to the particle’s velocity u, and cannot perform a non-zero work 
on it, and as a result, cannot be described by a contribution to U. However, if A did not participate in the 
functions ¥ and/or W at all, the analytical mechanics would be unable to describe effects of the 
magnetic field B=VxA on the particle’s motion. The relations (183) and (195)-(196) show the 
wonderful way in which physics (with some help from Mother Nature herself :-) solves this problem: 
the vector potential gives such contributions to the functions “ and W# that cannot be uniquely 
attributed to either kinetic or potential energy, but ensure both the Lagrange and Hamilton formalisms 
yield the correct equation of motion (144) — including the magnetic field effects. 


65 See, e.g., CM Sec. 10.1. 
66 Alternatively, this relation may be obtained from the expression for the Lorentz-invariant norm, p*pq = (mc)’, 
of the 4-momentum (75), p* = {é/c, p} = {((H—- q@)/c, P— gA}. 
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I believe I still owe the reader some discussion of the physical sense of the canonical momentum 
P. For that, let us consider a charged particle moving near a region of localized magnetic field B(r,s), but 
not entering this region (see Fig. 14), so that on its trajectory VxA =B =0. 


Br, ¢) 
Fig. 9.14. Particle’s motion around a localized 
C magnetic field with a time-dependent flux. 


If there is no electrostatic fields affecting the particle (i.e. no other electric charges nearby), we 
may select such a local gauge that g(r, 4) = 0 and A = A(0d), so that Eq. (144) is reduced to 


dp dA 
— =gE=-q—, 9.197 
ig an er (9.197) 
and Eq. (192) immediately gives 
ED A (9.198) 
dt dt dt 


Hence, even if the magnetic field is changed in time, so that the induced electric field E does accelerate 
the particle, its canonical momentum does not change. Hence P is a variable more stable to magnetic 
field changes than its kinetic counterpart p. This conclusion may be criticized because it relies on a 
specific gauge, and generally P = p + gA is not gauge—invariant, because the vector potential A is not.°’ 
However, as was already discussed in Sec. 5.3, the integral JA-dr over a closed contour is gauge- 
invariant and is equal to the magnetic flux ® through the area limited by the contour — see Eq. (5.65). 
So, integrating Eq. (197) over a closed trajectory of a particle (Fig. 14), and over the time of one orbit, 
we get 


App-dr=—gA®, —sothat APP-dr =0, (9.199) 
GC: C 


where A® is the change of flux during that time. This gauge-invariant result confirms the above 
conclusion about the stability of the canonical momentum to magnetic field variations. 


Generally, Eq. (199) is invalid if a particle moves inside a magnetic field and/or changes its 
trajectory at the field variation. However, if the field is almost uniform, 1.e. its gradient is small in the 
sense of Eq. (177), this result is (approximately) applicable. Indeed, analytical mechanics® tells us that 
for any canonical coordinate-momentum pair {q;, p;}, the corresponding action variable, 


1 
J, = o p,dq,. (9.200) 


remains virtually constant at slow variations of motion conditions. According to Eq. (191), for a particle 
in a magnetic field, the generalized momentum corresponding to the Cartesian coordinate 7; is P; rather 
than p;. Thus forming the net action variable J = J, + J, + J-, we may write 


67 In contrast, the kinetic momentum p = Mu is evidently gauge- (though not Lorentz-) invariant. 
68 See, e.g., CM Sec. 10.2. 
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2nJ = $P- dr = $p-dr + q® = const. (9.201) 


Let us apply this relation to the motion of a non-relativistic particle in an almost uniform 
magnetic field, with a relatively small longitudinal velocity, u||/u; +0 —see Fig. 15. 


Fig. 9.15. Particle in a magnetic field with 
a small longitudinal gradient VB || B. 


In this case, ® in Eq. (201) is the flux encircled by the particle’s cyclotron orbit, ® = —2R°B, 
where R is its radius given by Eq. (153), and the negative sign accounts for the fact that in our case, the 
“correct” direction of the normal vector n in the definition of flux, ® = JB-nd’r, is antiparallel to the 
vector B. At u <<c, the kinetic momentum is just p; = mu_, while Eq. (153) yields 


mu, =qBR. (9.202) 
Plugging these relations into Eq. (201), we get 


RB 
QaJ = mu,2aR — qaR?B = m= 2a — qaR?B = (2-l)qaR°B =-q®. (9.203) 
m 


This means that even if the circular orbit slowly moves through the magnetic field, the flux encircled by 
the cyclotron orbit should remain virtually constant. One manifestation of this effect is the result already 
mentioned at the end of Sec. 6: if a small gradient of the magnetic field is perpendicular to the field 
itself, then the particle orbit’s drift direction is perpendicular to VB, so that ® stays constant. 


Now let us analyze the case of a small longitudinal gradient, VB || B (Fig. 15). If a small initial 
longitudinal velocity u|| is directed toward the higher field region, the cyclotron orbit has to gradually 
shrink to keep ® constant. Rewriting Eq. (202) as 


AR°B || 
7R oe 7R 


mu, =q ; (9.204) 
we see that this reduction of R (at constant ®) increases the orbiting speed uw ;. But since the magnetic 
field cannot perform any work on the particle, its kinetic energy, 


ea Ou +u2), (9.205) 
should stay constant, so that the longitudinal velocity u || has to decrease. Hence eventually the orbit’s 
drift has to stop, and then it has to start moving back toward the region of lower fields, being essentially 
repulsed from the high-field region. This effect is very important, in particular, for plasma confinement 
systems. In the simplest of such systems, two coaxial magnetic coils, inducing magnetic fields of the 
same direction (Fig. 16), naturally form a “magnetic bottle”, which traps charged particles injected, with 
sufficiently low longitudinal velocities, into the region between the coils. More complex systems of this 
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type, but working on the same basic principle, are the most essential components of the persisting large- 
scale efforts to achieve controllable nuclear fusion.® 


Fig. 9.16. A simple magnetic bottle (schematically). 


Returning to the constancy of the magnetic flux encircled by free particles, it reminds us of the 
Meissner-Ochsenfeld effect, which was discussed in Sec. 6.4, and gives a motivation for a brief revisit 
of the electrodynamics of superconductivity. As was emphasized in that section, superconductivity is a 
substantially quantum phenomenon; nevertheless, the classical notion of the conjugate momentum P 
helps to understand its theoretical description. Indeed, the general rule of quantization of physical 
systems’? is that each canonical pair {g;, p;} of a generalized coordinate g; and the corresponding 
generalized momentum p; is described by quantum-mechanical operators that obey the following 
commutation relation: 


4, »P; |= iho j- (9.206) 


According to Eq. (191), for the Cartesian coordinates 7; of a particle in the magnetic field, the 
corresponding generalized momenta are P;, so that their operators should obey the similar commutation 
relations: 


[’,.B,|=ino,.. (9.207) 


In the coordinate representation of quantum mechanics, the canonical operators of the Cartesian 
components of the linear momentum are described by the corresponding components of the vector 
operator —ifV. As a result, ignoring the rest energy mc” (which gives an inconsequential phase factor 


exp{—imc’t/h} in the wavefunction), we can use Eg. (196) to rewrite the usual non-relativistic 
Schrédinger equation, 


ih oa = Hy , (9.208) 
as follows: 
a2 
pee NP age v=| : (-ihV-—qA) +q¢ly- (9.209) 
Ot 2m 2m 


Thus, I believe I have finally delivered on my promise to justify the replacement (6.50), which 
had been used in Secs. 6.4 and 6.5 to discuss the electrodynamics of superconductors, including the 
Meissner-Ochsenfeld effect. The Schrédinger equation (209) may be also used as the basis for the 
quantum-mechanical description of other magnetic field phenomena, including the so-called Aharonov- 
Bohm and quantum Hall effects — see, e.g., QM Secs. 3.1-3.2. 


69 For further reading on this technology, the reader may be referred, for example, to the simple monograph by F. 
Chen, Introduction to Plasma Physics and Controllable Fusion, vol. 1, 2" ed., Springer, 1984, and/or the 
graduate-level theoretical treatment by R. Hazeltine and J. Meiss, Plasma Confinement, Dover, 2003. 

70 See, e.g., CM Sec. 10.1. 
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9.8. Analytical mechanics of the electromagnetic field 


We have just seen that the analytical mechanics of a particle in an electromagnetic field may be 
used to get some important results. The same is true for the analytical mechanics of the electromagnetic 
field as such, and the field-particle system as a whole. For such space-distributed systems as fields, 
governed by local dynamics laws (in our case, the Maxwell equations), we need to apply analytical 
mechanics to the /ocal densities / and % of the Lagrangian and Hamiltonian functions, defined by 
relations 


L= [/a’r, H = [4a°r (9.210) 


Let us start, as usual, from the Lagrange formalism. Some clues on the possible structure of the 
Lagrangian function density / may be obtained from that of the particle-field interaction description in 
this formalism, discussed in the last section. As we have seen, for the case of a single particle, the 
interaction is described by the last two terms of Eq. (183): 


Lo, =—qo—qu-A. (9.211) 


Obviously, if the charge g is continuously distributed over some volume, we may represent this Yint as a 
volume integral of the following Lagrangian function density: 


ly =— P+ j-A=-j,A*. (9.212) 


Notice that this density (in contrast to “, itself!) is Lorentz-invariant. (This is due to the 
contraction of the longitudinal coordinate, and hence volume, at the Lorentz transform.) Hence we may 
expect the density of the field’s part of the Lagrangian to be Lorentz-invariant as well. Moreover, given 
the local structure of the Maxwell equations (containing only the first spatial and temporal derivatives of 
the fields), / se1q Should be a function of the potential’s 4-vector and its 4-derivative: 


Coed = ana 6,4°). (9.213) 


Also, the density should be selected in such a way that the 4-vector analog of the Lagrangian equation of 


motion, 
Obeta Crea 


a a(e,4") aa? 


=0, (9.214) 


gave us the correct inhomogeneous Maxwell equations (127).’! The field part Giea of the total 
Lagrangian density /should be a scalar and a quadratic form of the field strengths, i.e. of the tensor F?’, 
so that the natural choice is 

ts 


it 


aa = constx FF, (9.215) 


with the implied summation over both indices. Indeed, adding to this expression the interaction 
Lagrangian (212), 
f= ba +4 = const x ee a aoe Pe: oe (9.216) 


7! Here the implicit summation over the index @ plays a role similar to the convective derivative (188) in 
replacing the full derivative over time, in a way that reflects the symmetry of time and space in special relativity. I 
do not want to spend more time justifying Eq. (214), because of the reasons that will be clear imminently. 
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and performing the differentiations, we see that Eqs. (214)-(215) indeed yield Eqs. (127), provided that 
the constant factor equals (-1/4zi).72 So, the field’s Lagrangian density is 


(9.217) 


where ue is the electric field energy density (1.65), and um is the magnetic field energy density (5.57). 
Let me hope the reader agrees that Eq. (217) is a wonderful result because the Lagrangian function has a 
structure absolutely similar to the well-known expression “ = T— U of classical mechanics. So, for the 
field alone, the “potential” and “kinetic” energies are separable again.”? 


Now let us explore whether we can calculate the 4-form of the field’s Hamiltonian function % 
In the generic analytical mechanics, 
eH =) 4; -#. (9.218) 
7 09; 
However, just as for the Lagrangian function, for a field we should find the spatial density 4% of the 
Hamiltonian, defined by the second of Eqs. (210), for which the natural 4-form of Eq. (218) is 


op ng 
a(0, A’) 


Z 


A’-g¥%@, (9.219) 
Calculated for the field alone, i.e. using Eq. (217) for 4 this definition yields 


AGO” 15 (9.220) 
where the tensor 


(24 1 a 1 (24 
Oe “if 6 a +78 ae), (9.221) 
0 


is gauge-invariant, while the remaining term, 
1 
i= ree ee (9.222) 
0 


is not, so that it cannot correspond to any measurable variables. Fortunately, it is straightforward to 
verify that the last tensor may be represented in the form 


1 
r? -—@ (F™A*), (9.223) 
0 
and as a result, obeys the following relations: 


a,t¢ =0, [2)/a*r=0, (9.224) 


72 In the Gaussian units, this coefficient is (-1/167). 

73 Since the Lagrange equations of motion are homogeneous, the simultaneous change of the signs of T and U 
does not change them. Thus, it is not important which of the two energy densities, uw. or Um, We count as the 
potential energy, and which as the kinetic energy. (Actually, such duality of the two energy components is typical 
for all analytical mechanics — see, e.g., the discussion of this issue in CM Sec. 2.2.) 
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so it does not interfere with the conservation properties of the gauge-invariant, symmetric energy- 
momentum tensor (also called the symmetric stress tensor) &*, to be discussed below. 


Let us use Eqs. (125) to express the elements of the latter tensor via the electric and magnetic 
fields. For a = B=0, we get 


O° = fF? 4—— =4, +u, =u, (9.225) 


i.e. the expression for the total energy density u — see Eq. (6.113). The other 3 elements of the same 
row/column turn out to be just the Cartesian components of the Poynting vector (6.114), divided by c: 


. S. 
on 1 (Exp -(Exn) =! for j=1,2,3. (9.226) 
Lo c j Cc 7 Cc 


The remaining 9 elements 6; of the tensor, with/, j’= 1, 2, 3, are usually represented as 
ov = =f (9.227) 


where 2™ is the so-called Maxwell stress tensor: 


(9.228) 


so that the whole symmetric energy-momentum tensor (221) may be conveniently represented in the 
following symbolic way: 


(9.229) 


The physical meaning of this tensor may be revealed in the following way. Considering Eq. 
(221) as the definition of the tensor 6””,74 and using the 4-vector form of Maxwell equations given by 
Eqs. (127) and (129), it is straightforward to verify an extremely simple result for the 4-derivative of the 
symmetric tensor: 


6,0% =-F" j,. (9.230) 


This expression is valid in the presence of electromagnetic field sources, e.g., for any system of charged 
particles and the fields they have created. Of these four equations (for four values of the index f), the 
temporal one (with #= 0) may be simply expressed via the energy density (225) and the Poynting vector 
(226): 

Ou 


—+V-S=-j-E, 9.231 
Ai j ( ) 


while three spatial equations (with 6=7 = 1, 2, 3) may be represented in the form 


74 In this way, we are using Eq. (219) just as a useful guess, which has led us to the definition of 6”, and may 
leave its strict justification for more in-depth field theory courses. 
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O S, 3 O (M) e 
ae pes 7\\) = -(pE + jxB),. (9.232) 
sal 7 


If integrated over a volume V limited by surface S, with the account of the divergence theorem, 
Eq. (231) returns us to the Poynting theorem (6.111): 


(= +j- Ber + fS,d?r = 0, (9.233) 
1\ Gt : 
while Eq. (232) yields75 
OS sel gra bas ith f = pE+jxB 9.234 
lea r=) beh jo Wi = P+ Jxb, (9.234) 
V j f=ls 


where d4;= ndA = njd’r is the ;" component of the elementary area vector dA = ndd = nd’r that is 
normal to the volume’s surface, and directed out of the volume — see Fig. 17.76 


dA =ndA 
volume V 
occupied by the field 


Fig. 9.17. The force dF exerted on a boundary 
j element dA of the volume V occupied by the field. 
surface S 


Since, according to Eq. (5.10), the vector f in Eq. (234) is nothing other than the density of 
volume-distributed Lorentz forces exerted by the field on the charged particles, we can use the 2" 
Newton law, in its relativistic form (144), to rewrite Eq. (234), for a stationary volume V, as 


(9.235) 


where Ppar iS the total mechanical (relativistic) momentum of all particles in the volume V, and the 
vector F is defined by its Cartesian components: 


(9.236) 


Relations (235)-(236) are our main new results. The first of them shows that the vector 


75 Just like the Poynting theorem (233), Eq. (234) may be obtained directly from the Maxwell equations, without 
resorting to the 4-vector formalism — see, e.g., Sec. 8.2.2 in D. Griffiths, Introduction to Electrodynamics, 3% ed., 
Prentice-Hall, 1999. However, the derivation discussed above is superior because it shows the wonderful unity 
between the laws of conservation of energy and momentum. 

76 The same notions are used in the mechanical stress theory — see, e.g., CM Sec. 7.2. 
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S 
2 
c 


g=—, (9.237) 
already discussed in Sec. 6.8 without derivation, may be indeed interpreted as the density of momentum 
of the electromagnetic field (per unit volume). This classical relation is consistent with the quantum- 
mechanical picture of photons as ultra-relativistic particles, with a momentum of magnitude ¢/c, because 
then the flux of the momentum carried by photons through a unit normal area per unit time may be 
represented either as S,/c or as g,c. It also allows us to revisit the Poynting vector paradox that was 
discussed in Sec. 6.8 — see Fig. 611 and its discussion. As was emphasized in that discussion, in this 
case, the vector S = ExH does not correspond to any measurable energy flow. However, the 
corresponding momentum of the field, equal to the integral of the density (237) over a volume of 
interest,’” is not only real but may be measured by the recoil impulse it gives to the field sources — say, 
to a magnetic coil inducing the field H, or to the capacitor plates creating the field E. 


Now let us turn to our second result, Eq. (236). It tells us that the 3x3-element Maxwell stress 
tensor complies with the general definition of the stress tensor’® characterizing the forces exerted on the 
boundaries of a volume, in our current case the volume occupied by the electromagnetic field (Fig. 17). 
Let us use this important result to analyze two simple examples of static fields. 


(1) Electrostatic field’s effect on a perfect conductor. Since Eq. (235) has been derived for a free 


space region, we have to select volume V outside the conductor, but we may align one of its faces with 
the conductor’s surface (Fig. 18). 


Fig. 9.18. The electrostatic field 
ees bia ts So ae near a conductor’s surface. 


From Chapter 2, we know that the electrostatic field just outside the conductor’s surface has to 
be normal to it. Selecting the z-axis in this direction, we have E£,. = E, =0, E, = +E, so that only diagonal 
elements of the tensor (228) are not equal to zero: 


ro) = 7) Fo p27) _ FO (9.238) 
: 2 2 

Since the elementary surface area vector has just one non-zero component, dA,, according to Eq. (236), 
only the last component (that is positive regardless of the sign of £) gives a contribution to the surface 
force F. We see that the force exerted by the conductor (and eventually by the external forces that hold 
the conductor in its equilibrium position) on the field is normal to the conductor and directed out of the 
field volume: dF; > 0. Hence, by the 3" Newton law, the force exerted by the field on the conductor’s 
surface is directed toward the field-filled space: 


77 Tt is sometimes called hidden momentum. 
78 See, e.g., CM Sec. 7.2. 
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(9.239) 


This important result could be obtained by simpler means as well. (Actually, this was the task of 
one of the exercise problems assigned in Chapter 2.) For example, one could argue, quite convincingly, 
that the local relation between the force and the field should not depend on the global configuration 
creating the field, and thus consider the simplest configuration, a planar capacitor (see, e.g. Fig. 2.3) 
with surfaces of both plates charged by equal and opposite charges of density o = +a E. According to 
the Coulomb law, the charges should attract each other, pulling each plate toward the field region, so 
that the Maxwell-tensor result gives the correct direction of the force. Now the force’s magnitude given 
by Eq. (239) may be verified either by the direct integration of the Coulomb law or by the following 
simple reasoning. In the plane capacitor, the inner field E, = o/& is equally contributed by two surface 
charges; hence the field created by the negative charge of the counterpart plate (not shown in Fig. 18) is 
E. =—o/2é, and the force it exerts of the elementary surface charge dO = odA of the positively charged 
plate is dF surface = EdO =—0°dA/2. = &E’dA/2, in accordance with Eq. (239).7 


Quantitatively, even for such a high electric field as E=10° V/m (close to the electric 
breakdown’s threshold in the air at a frequency of 10 GHz®°), the “negative pressure” (dF/dA) given by 
Eq. (239) is of the order of 0.05 Pa (N/m’), i.e. many orders below the ambient atmospheric pressure of 
1 bar ~ 10° Pa. Still, this negative pressure may be substantial (well above 1 bar) in some cases, for 
example in good dielectrics (such as the high-quality SiO2 grown at high temperature, which is broadly 
used in integrated circuits), which can withstand electric fields up to ~10° V/m. 


(11) Static magnetic field’s effect on its source’! — say a solenoid’s wall or a superconductor’s 
surface (Fig. 19). With the Cartesian coordinates’ choice shown in that figure, we have B, = B, B, = B, = 
0, so that the Maxwell stress tensor (228) is diagonal again: 

(M) _ 1 B, 


xXx 


7) 70 |g? (9.240) 


¢ yy zz 
However, since for this geometry, only dA, differs from 0 in Eq. (236), the sign of the resulting force is 
opposite to that in electrostatics: dF, < 0, and the force exerted by the magnetic field upon the 
conductor’s surface, 


dF. 


(9.241) 


surface — 


1 By the way, repeating these arguments for a plane capacitor filled with a linear dielectric, we may 
readily see that Eq. (239) may be generalized for this case by replacing & with ¢. A similar replacement 
(44 — 4) is valid for Eq. (241) in a linear magnetic medium. 

80 Note that the breakdown field EF, in is a strong function of frequency. In the ambient air, it drops from its dc 
value of ~3x10° V/m to ~1.5x10° V/m at microwave frequencies and then rises to as much as ~6x10° V/m at 
optical frequencies. The reason of the rise is that at very high frequencies, the amplitude of the field-induced 
oscillations of the rare free electrons becomes much smaller than their mean free path, inhibiting the bulk impact- 
ionization of neutral atoms. (Because of this reason, F, also depends on the air’s pressure.) 

81 The causal relation is not important here. Especially in the case of a superconductor, the magnetic field may be 
induced by another source, with the surface supercurrent j just shielding the superconductor’s bulk from its 
penetration — see Sec. 6. 
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corresponds to positive pressure. For good laboratory magnets (B ~ 10 T), this pressure is of the order of 
4x10’ Pa x 400 bars, i.e. is very substantial, so the magnets require solid mechanical design. 


Fig. 9.19. The magnetostatic field near 
a current-carrying surface. 


® 


The direction of the force (241) could be also readily predicted using elementary magnetostatics 
arguments. Indeed, we can imagine the magnetic field volume limited by another, parallel wall with the 
opposite direction of surface current. According to the starting point of magnetostatics, Eq. (5.1), such 
surface currents of opposite directions have to repulse each other — doing that via the magnetic field. 


Another explanation of the fundamental sign difference between the electric and magnetic field 
pressures may be provided using the electric circuit language. As we know from Chapter 2, the potential 
energy of the electric field stored in a capacitor may be represented in two equivalent forms, 


2 2 
U -——-£. (9.242) 
Similarly, the magnetic field energy of an inductive coil is 
BE 2 


If we do not want to consider the work of external sources at a virtual change of the system dimensions, 
we should use the last forms of these relations, i.e. consider a galvanically detached capacitor (O = 
const) and an externally-shorted inductance (OW = const).8? Now if we let the electric field forces (239) 
drag the capacitor’s plates in the direction they “want”, i.e. toward each other, this would lead to a 
reduction of the capacitor thickness, and hence to an increase of its capacitance C, and hence to a 
decrease of U,. Similarly, for a solenoid, allowing the positive pressure (241) to move its walls from 
each other would lead to an increase of the solenoid’s volume, and hence of its inductance L, so that the 
potential energy Um would be also reduced — as it should be. It is remarkable (actually, beautiful!) how 
the local field formulas (239) and (241) “know” about these global circumstances. 


Finally, let us see whether the major results (237) and (241) obtained in this section, match each 
other. For that, let us return to the normal incidence of a plane, monochromatic wave from the free space 
upon the plane surface of a perfect conductor (see, e.g., Fig. 7.8 and its discussion), and use those results 
to calculate the time average of the pressure dF surface/€A imposed by the wave on the surface. At elastic 
reflection from the conductor’s surface, the electromagnetic field’s momentum retains its amplitude but 
reverses its sign, so that the average momentum transferred to a unit area of the surface in a unit time 
(i.e. the average pressure) is 


82 Of course, this condition may hold “forever” only for solenoids with superconducting wiring, but even in 
normal-metal solenoids with practicable inductances, the flux relaxation constants L/R may be rather large 
(practically, up to a few minutes), quite sufficient to carry out the force measurement. 
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dF. Si EH 
ae 7 2C$ incident = 2c 2 =e Cc = Ey, vy) 


where E,, and H,, are complex amplitudes of the incident wave. Using the relation (7.7) between these 
amplitudes (for ¢= & and “= uo giving E,= cB), we get 

AF tace = nh By = 

dA c Ho Ho 


(9.245) 


On the other hand, as was discussed in Sec. 7.3, at the surface of a perfect mirror the electric 
field vanishes while the magnetic field doubles, so that we can use Eq. (241) with B > B(t) = 
2Re[B.exp {-iot}]. Averaging the pressure given by Eq. (241) over time, we get 


Lae 1 -iat | | ee | ; 
MM shee -_* (2RelB aaa: 9.246 
dA oa | ae ) Ho : 


i.e. the same result as Eq. (245). 


For physics intuition development, it is useful to evaluate the electromagnetic radiation pressure. 
Even for a relatively high wave intensity S,, of | kW/m’ (close to that of the direct sunlight at the Earth’s 
surface), the pressure 2cg, = 2S,/c is somewhat below 10° Pa ~ 10°'° bar. Still, this extremely small 
effect was experimentally observed (by P. Lebedev) as early as 1899, giving one more confirmation of 
Maxwell’s theory. Currently, there are ongoing attempts to use the pressure of the Sun’s light for 
propelling small spacecraft, e.g., the LightSail 2 satellite with a 32-m’ sail, launched in 2019. 


9.9. Exercise problems 


9.1. Use the pre-relativistic picture of the Doppler effect, in which light propagates with velocity 
c in a Sun-bound aether, to derive Eq. (4). 


9.2. Show that two successive Lorentz space/time transforms in the same direction, with 
velocities u’ and v, are equivalent to a single transform with the velocity u given by Eq. (25). 


9.3. N+ 1 reference frames, numbered by index n (taking values 0, 1, ..., N), move in the same 
direction as a particle. Express the particle’s velocity in the frame number 0 via its velocity uy in the 
frame number WN and the set of velocities v, of the frame number nv relative to the frame number (n — 1). 


9.4. A spaceship moving with a constant velocity v directly from the Earth, sends back brief 
flashes of light with a period At, — as measured by the spaceship's clock. Calculate the period with that 
Earth-based observers may receive these signals — as measured by their clock. 


9.5. From the point of view of observers in a “moving” reference y 
frame 0’, a straight thin rod, parallel to the x’-axis, is moving without rotation 
with a constant velocity u’ directed along the y’-axis. The reference frame 0’ is 
itself moving relative to another ("lab") reference frame 0 with a constant 
velocity v along the x-axis, also without rotation — see the figure on the right. 
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Calculate: 


(i) the direction of the rod's velocity, and 
(ii) the orientation of the rod on the [x, y] plane, 


both as observed from the lab reference frame. Is the velocity, in this frame, perpendicular to the rod? 


9.6. Starting from the rest at tf = 0, a spaceship moves directly from the Earth, with a constant 
acceleration as measured in its instantaneous rest frame. Find its displacement x(t) from the Earth, as 
measured from Earth’s reference frame, and interpret the result. 


Hint: The instantaneous rest frame of a moving particle is the inertial reference frame that, at the 
considered moment of time, has the same velocity as the particle. 


9.7. Analyze the twin paradox for the simplest case of 1D travel with a piecewise-constant 
acceleration. 


Hint: You may use an intermediate result of the solution of the previous problem. 


9.8. Suggest a natural definition of the 4-vector of linear acceleration (commonly called the 4- 
acceleration) of a point, and use it to calculate the acceleration of a relativistic point moving with 
velocity u = u(?). 


9.9. Calculate the first relativistic correction to the frequency of a harmonic oscillator as a 
function of its amplitude. 


9.10. An atom, with an initial rest mass m, has been excited to an internal state with an additional 
energy A¢, still being at rest. Next, it returns to its initial state, emitting a photon. Calculate the photon’s 
frequency, taking into account the relativistic recoil of the atom. 


Hint: In this problem, and also in Problems 11-13, treat photons as classical ultra-relativistic 
point particles with zero rest mass, energy & = h@, and momentum p = hk. 


9.11. A particle of mass m, initially at rest, decays into two particles with rest masses m, and mp. 
Calculate the total energy of the first decay product, in the reference frame moving with that particle. 


9.12. A relativistic particle with a rest mass m, moving with velocity u, decays into two particles 
with zero rest mass. 


(i) Calculate the smallest possible angle between the decay product velocities (in the lab frame, 
in that the velocity u is measured). 
(ii) What is the largest possible energy of one product particle? 


9.13. A relativistic particle flying in free space with velocity u, decays into two photons.83 
Calculate the angular dependence of the probability of photon detection, as measured in the lab frame. 


83 Such a decay may happen, for example, with a neutral pion. 
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9.14. A photon with wavelength J is scattered by an 


a a P_--¥O 
electron, initially at rest. Calculate the wavelength 2’ of the “\“7 \> — 
scattered photon as a function of the scattering angle a — Nees, 


see the figure on the right.84 ha 


9.15. Calculate the threshold energy of a ~photon for the reaction 
ytp>ptm, 
if the proton was initially at rest. 


Hint: For protons, Mpc = 938 MeV, while for neutral pions, myc’ % 135 MeV. 


9.16. Calculate the largest possible velocity of the electrons provided at the so-called Gdecays, 
noptet+yv,> 
of neutrons at rest. 
Hint: Electron neutrinos and antineutrinos are virtually massless (on the energy scale of this 
problem); the rest energies & = mc’ of the other involved particles are: 939.565 MeV for the neutron, 
938.272 MeV for the proton, 0.511 MeV for the electron. 


9.17. A relativistic particle with a rest mass m and energy &, collides with a similar particle, 
initially at rest in the laboratory reference frame. Calculate: 

(i) the final velocity of the center of mass of the system, in the lab frame, 

(11) the total energy of the system, in the center-of-mass frame, and 

(iii) the final velocities of both particles (in the lab frame), if they move along the same 
direction. 


9.18. A “primed” reference frame moves, relative to the “lab” frame, with a reduced velocity B = 
vic = n,f. Use Eq. (109) to express the elements T*? and 7” (with j = 1, 2, 3) of an arbitrary 
contravariant 4-tensor 7” via its elements in the lab frame. 


9.19. Prove that quantities E” — c’B’ and E-B are Lorentz-invariant. 


9.20. Static fields E and B are uniform but arbitrary (both in magnitude and in direction). What 
should be the velocity of an inertial reference frame to have the vectors E’ and B’, observed from that 
frame, parallel? Is this solution unique? 


q u 


9.21. Two charged particles, moving with equal constant velocities u, are 
offset by distance R = {a, b} (see the figure on the right), as measured in the lab 
frame. Calculate the forces between the particles — also in the lab frame. q> 


84 This is the famous Compton scattering effect, whose discovery in 1923 was one of the major motivations for 
the development of quantum mechanics — see, e.g., QM Sec. 1.1. 
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9.22. Each of two thin, long, parallel particle beams of the same velocity u, separated by distance 
d, carries electric charge with a constant density 2 per unit length, as measured in the reference frame 
moving with the particles. 


(i) Calculate the distribution of the electric and magnetic fields in the system (outside the 
beams), as measured in the lab reference frame. 

(ii) Calculate the interaction force between the beams (per particle) and the resulting 
acceleration, both in the lab reference frame and in the frame moving with the particles. Compare the 
results and give a brief discussion of their relation. 


9.23. Spell out the Lorentz transform of the scalar-potential and vector-potential components, 
and use the result to calculate the potentials of a point charge gq moving with a constant velocity u, as 
measured in the lab reference frame. 


9.24. Calculate the scalar and vector potentials created by a time-independent electric dipole p, 
as measured in a reference frame that moves relative to the dipole with a constant velocity v, with the 
shortest distance (“impact parameter’) equal to b. 


9.25. Solve the previous problem, in the limit v << c, for a time-independent magnetic dipole m. 


9.26. Assuming that the magnetic monopole does exist and has a magnetic charge g, calculate 
the change A® of the magnetic flux in a superconductor ring due to the passage of a single monopole 
through it. Evaluate A® for the monopole charge conjectured by P. Dirac, g = ngo = n(2zh/e), where n is 
an integer; compare the result with the magnetic flux quantum Do (6.62) with | g | = e, and discuss their 
relationship. 

Hint: For simplicity, you may consider the monopole’s passage along the symmetry axis of a 
thin, round superconducting ring, in otherwise free space. 


9.27. Re-derive Eq. (9.161) of the lecture notes for the simplest case p(0) = 0, using the 4-vector 
form (9.145) of the equation of motion and the notion of rapidity y= tanh'f that was briefly discussed 
in Sec. 9.2. 


9.28." Calculate the trajectory of a relativistic particle in a uniform electrostatic field E, for an 
arbitrary direction of its initial velocity u(0), using two different approaches — at least one of them 
different from the approach used in Sec. 6 for the case u(0) L E. 

9.29. A charged relativistic particle with velocity u performs planar cyclotron rotation in a 
uniform external magnetic field of magnitude B. How much would the velocity and the orbit’s radius 


change at a slow change of the field to a new magnitude B’? 


9.30.’ Analyze the motion of a relativistic particle in uniform, mutually perpendicular fields E 
and B, for the particular case when E is exactly equal to cB. 


9.31." Find the law of motion of a relativistic particle in uniform, parallel, static fields E and B. 
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9.32. An external Lorentz force F is exerted on a relativistic particle with an electric charge q 
and a rest mass m, moving with velocity u relatively some “lab” frame. Calculate its acceleration as 
observed from that frame. 


9.33. Neglecting relativistic kinetic effects, calculate the smallest voltage V that has to be applied 
between the anode and cathode of a magnetron (see Fig. 13 and its discussion) to enable electrons to 
reach the anode, at negligible electron-electron interactions (including the space-charge effects) and 
collisions with the residual gas molecules. You may: 


(i) model the cathode and anode as two coaxial round cylinders, of radii R; and Ro, respectively; 
(ii) assume that the magnetic field B is uniform and directed along their common axis; and 
(111) neglect the initial velocity of the electrons emitted by the cathode. 


(After the solution, estimate the validity of the last assumption and of the non-relativistic approximation, 
for reasonable values of parameters.) 


9.34. A charged relativistic particle has been injected into a uniform electric field whose 
magnitude oscillates in time with frequency @. Calculate the time dependence of the particle’s velocity, 
as observed from the lab reference frame. 


9.35. A plane, linearly-polarized electromagnetic wave of frequency @ is incident on an 
otherwise free relativistic particle with electric charge g. Analyze the dynamics of the particle’s 
momentum and compare the result with those of the previous problem and Problem 7.5. 


9.36. Analyze the motion of a non-relativistic particle in a region where the electric and 
magnetic fields are both uniform and constant in time, but not necessarily parallel or perpendicular to 
each other. 


9.37. A static distribution of electric charge in otherwise free space has created a time- 
independent distribution E(r) of the electric field. Use two different approaches to express the field 
energy density wu’ and the Poynting vector S’, as observed from a reference frame moving with constant 
velocity v, via the components of the vector E. In particular, is S’ equal to (-vu’)? 


9.38. A plane wave of frequency @ and intensity S, is normally incident on a perfect mirror 
moving with velocity v in the same direction as the wave. 


(i) Calculate the reflected wave’s frequency, and 
(ii) use the Lorentz transform of the fields to calculate the reflected wave’s intensity 


— both as observed from the lab reference frame. 


9.39. Perform the second task of the previous problem by using general relations between the 
wave’s energy, power, and momentum. 


Hint: As a byproduct, this approach should also give you the pressure exerted by the wave on the 
moving mirror. 
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9.40. Consider the simple model of a capacitor’s charging by a lumped 
current source, shown in the figure on the right, and prove that the momentum 


given by the constant, uniform external magnetic field B to the current- ® ee B © 
carrying conductor is equal and opposite to the momentum of the @ % @ 


electromagnetic field that the current /(t) builds up in the capacitor. (You may 
assume that the capacitor is plane and very broad, and neglect the fringe field 
effects.) 


9.41. Consider an electromagnetic plane wave packet propagating in free space, with its electric 
field represented as the Fourier integral 


+00 : 
E(r,t) = Re [E.e"* dk, with y, =kz-@,t, and a, =¢|k|. 
Express the full linear momentum (per unit area of wave’s front) of the packet via the complex 
amplitudes E, of its Fourier components. Does the momentum depend on time? (In contrast with 
Problem 7.8, in this case the wave packet is not necessarily narrow.) 


9.42. Calculate the forces exerted on well-conducting walls of a waveguide with a rectangular 


(axb) cross-section, by a wave propagating along it in the fundamental (Hio) mode. Give an 
interpretation of the results. 
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Chapter 10. Radiation by Relativistic Charges 


The discussion of special relativity in the previous chapter enables us to revisit the analysis of 
electromagnetic radiation by charged particles, now for arbitrary velocities. For a single point particle, 
it turns out to be possible to calculate the radiated wave fields in an explicit form and analyze the 
results for such important particular cases as synchrotron radiation and the “Bremsstrahlung” (brake 
radiation). After that, we will discuss the apparently unrelated effect of the so-called Coulomb losses of 
energy by a particle moving in condensed matter, because this discussion will naturally lead us to such 
important phenomena as the Cherenkov radiation and the transitional radiation. At the end of the 
chapter, I will briefly review the effects of the back action of the emitted radiation on the emitting 
particle, whose analysis reveals some limitations of classical electrodynamics. 


10.1. Liénard-Wiechert potentials 
A convenient starting point for the discussion of radiation by relativistic charges is provided by 
Eqs. (8.17) for the retarded potentials. In free space, these formulas with the integration variable 
notation changed from r’ to r” for the clarity of what follows, are reduced to 


eRe) ag. A(r,t) = Ho (= Lee with R=r-r’”. (10.1a) 


1 ptr", 
r,tj)= 
An ra R An R 


As a reminder, Eqs. (1a) were derived from the Maxwell equations without any restrictions, and are very 
natural for situations with continuous distributions of the electric charge and/or current. However, for a 
single charged particle, whose charge and current distributions may be described as 


plr,t)= qd(r-r’), j(r.c)= quo(r-r’), with u=r’, (10.1b) 


where r’=r (f) is the instantaneous position of the charge, it is more convenient to recast Eqs. (1a) into 
an explicit form that would not require integration in each particular case. Indeed, as Eqs. (1) show, the 
potentials at a given observation point {r, ¢} are contributed by only one specific point {1 (tet), fret} of 
the particle’s 4D trajectory (called its world line), which satisfies the following condition: 


R 
=p, (10.2) 


where fret 1S called the retarded time, and Rye is the length of the following distance vector 
Ro =r(t)-r'(ta) (10.3) 
— physically, the distance covered by the electromagnetic wave from its emission to observation. 
The reduction of Eqs. (1a) to such a simpler form, however, requires some care. Their naive 


integration over r” would yield the following apparent but wrong results: 


1 oq .. Olt.t)_ My ge Hy Wye 
t)=—— , ie. = — Alr,t)=——~ WRONG! 10.4 
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where U;e iS the particle’s velocity at the retarded point r (te). Eqs. (4) is a good example of how the 
relativity theory (even the special one :-) cannot be taken too lightly. Indeed, the strings (9.84)-(9.85), 
formed from the apparent potentials (4), would not obey the Lorentz transform rule (9.91), because 
according to Eqs. (2)-(3), the distance R,<¢ also depends on the reference frame it is measured in. 


In order to correct the error, we need, first of all, to discuss the conditions (2)-(3). Combining 
them by eliminating Rye, we get the following equation for fret: 


C(t te.) =| — tb) (10.5) 


Figure | depicts the graphical solution of this self-consistency equation as the only! point of intersection 
of the light cone of the observation point (see Fig. 9.9 and its discussion) and the particle’s world line. 


particle's 
time word line 


{r'(t"),'} —~s 


r r'(¢ y bret , 
We steer} Fig. 10.1. Graphical 


solution of Eq. (5). 


In Eq. (5), just as in Eqs. (1)-(3), all variables have to be measured in the inertial (“lab”) 
reference frame in which the observation point r rests. Now let us write Eqs. (1) for a point charge in 
another inertial frame the frame 0’ whose velocity (as measured in the lab frame) coincides, at the 
moment t’ = fye:, with the velocity U; of the charge.? In that frame, the charge rests, so that, as we know 
from the electro- and magnetostatics, 

g-—4, A’=0. (10.6a) 
Are, R' 
(Remember that this R’ may not be equal to Rie, because the latter distance is measured in the “lab” 
reference frame.) Let us use the identity 1/g = yc’ again to rewrite Eqs. (6a) in the form of components 
of a 4-vector similar in structure to the last two of Eqs. (4): 


G _ Mo a 


—, A’'=0. (10.6b) 
c 4a R' 


Now it is easy to guess the correct answer for the 4-potential for an arbitrary reference frame: 


! As Fig. 1 shows, there is always another, “advanced” point {r (tagy), taav} Of the particle’s world line, with tag, > 
t, which is also a solution of Eq. (5), but it does not fit Eqs. (1), because the observation, at the point {r, tf < tay}, 
of the field induced at the advanced point, would violate the causality principle. 

2 This is just a particular case of the instantaneous reference frame —the notion that was encountered in several 
exercise problems of the previous chapter, and indeed was implied (though admittedly not sufficiently advertised) 
as the derivation of the key Eq. (9.60). 
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et Ge 
At = £0 : (10.7) 

B 

An u,R 


where (as a reminder) 4° = {@/c, A}, u® = y{c, u}, and R® is the 4-vector of the inter-event distance, 
formed similarly to that of a single event — cf. Eq. (9.48): 


R® = {c(t-’),R} = {c(t-’),r—r'}. (10.8) 


Indeed, we needed the 4-vector A® that would: 


(i) obey the Lorentz transform, 
(ii) have its spatial components 4; scaling, at low velocity, as u;, and 
(iii) be reduced to the correct result (6) in the instantaneous reference frame of the charge. 


Eq. (7) evidently satisfies all these requirements, because the scalar product in its denominator is just 


u,R? =y{c,-u}- {c(t-1’),R} = ic? (¢-1')-u-R] = 7e(R-B-R)=ycR(-B-n), (10.9) 


where n = R/R is a unit vector in the observer’s direction, B = u/c is the normalized velocity of the 
particle, and y= 1/(1- u’/c’)'”. In the instantaneous reference frame of the charge (in which B =0 and v 
= 1), the expression (9) is reduced to cR, so that Eq. (7) is correctly reduced to Eq. (6b). Now let us spell 
out the components of Eq. (7) for the lab frame (in which t’ = f,, and R = Rret): 


_ q a. 1 
t)= te @-pR), dara (10.10a) 


ret ~ ATE, Ril _ B . n) 


u eal B _ U vet 
ara, 4a “ar, eure on 


These formulas are called the Liénard-Wiechert potentials.’ In the non-relativistic limit, they 
coincide with the naive guess (4), but in the general case include the additional factor 1/(1 — B-n)ret. Its 
physical origin may be illuminated by one more formal calculation — whose result we will need anyway. 
Let us differentiate the geometric relation (5), rewritten as 


Ree = Cb — bree) s (10.11) 


Over fret and then, independently, over ¢, assuming that r is fixed. For that, let us first differentiate, over 
tret, both sides of the identity Rre” = Rret-Rret! 


et (10.12) 


ret = ret | ret at (n-u).... (10.13) 


ret 


Now let us differentiate the same R,.; over ¢. On one hand, Eq. (11) yields 


3 They were derived in 1898 by Alfred-Marie Liénard and (independently) in 1900 by Emil Wiechert. 
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apap (10.14) 


On the other hand, according to Eq. (5), at the partial differentiation over time, i.e. if r is fixed, fret is a 
function of ¢ alone, so that (using Eq. (13) at the second step), we may write 
OR OR (Clg: 
Ot Ot., Ot 


ret 


n-u),, —. (10.15) 


ret ret 


Now requiring Eqs. (14) and (15) to give the same result, we get:4 


Cc _ 1 
awe (5). (10.16) 


This important relation may be readily re-derived (and more clearly understood) for the 
particular case when the charge’s velocity is directed straight toward the observation point. In this case, 
its vector u resides in the same space-time plane as the observation point’s world line r = const — say, 
the plane [x, t] shown in Fig. 2. 


Ldt =dt., —de.,/¢ 


ret 


ati 
Meet © Fig. 10.2. Deriving Eq. (16) 


for the case B-n = £. 


dt 


ret ret 


=u 


Let us consider an elementary time interval dt, = dt’, during which the particle would travel the 
space interval dX;et = UretAtret. In Fig. 2, the corresponding segment of its world line is shown with a solid 
vector. The dotted vectors in this figure show the world lines of the radiation emitted by the particle in 
the beginning and at the end of this interval, and propagating with the speed of light c. As it follows 
from the drawing, the time interval dt between the instants of the arrival of the radiation from these two 
points to any time-independent spatial point of observation is 


dx 
B= dt - 
Cc 


u dt 1 1 
i dt so that —“ 
c 


dt ee fe i= 8. 


ret 


dt = dt... — 


(10.17) 


ret? 


This expression coincides with Eq. (16) for our particular case when the directions of the vectors B = u/c 
and n = R/R (both taken at time fer) coincide, and hence (B-n)rer = Aret. The difference between Eqs. 
(16) and (17) may be interpreted by saying that the particle’s velocity in the transverse directions 
(normal to the vector n) is not important for this kinematic effect> — the fact almost evident from Fig. 1. 


4 This relation may be used for an alternative derivation of Eqs. (10) directly from Eqs (1) — the exercise highly 
recommended to the reader. 

5 Note that this effect (linear in f) has nothing to do with the Lorentz time dilation (9.21), which is quadratic in £. 
(Indeed, all our arguments above referred to the same, lab frame.) Rather, it is close in nature to the Doppler 
effect. 
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So, the additional factor in the Liénard-Wiechert potentials is just the derivative Ot;e/Ot. The 
reason for its appearance in Eqs. (10) is usually interpreted along the following lines. Let the charge q 
be spread along the direction of the vector Rret (in Fig. 2, along the x-axis) by an infinitesimal speed- 
independent interval Ox;., so that the linear density / of its charge is proportional to 1/dxre. Then the 
time rate of charge’s arrival at some spatial point 1S Auret = AdXrei/dtret, 1.€. Scales as 1/dtyet. However, the 
rate of radiation’s arrival at the observation point scales as 1/dt, so that due to the non-zero velocity Uret 
of the particle, this rate differs from the charge arrival rate by the factor of dt,./dt, given by Eq. (16). (If 
the particle moves toward the observation point, (B-n);c: > 0, as shown in Fig. 2, this factor is larger than 
1.) This radiation compression effect leads to the field change (at (B-n),: > 0, its enhancement) by the 
same factor (16) — as described by Eqs. (10). 


So, the 4-vector formalism was very instrumental for the calculation of field potentials. It may be 
also used to calculate the fields E and B — by plugging Eq. (7) into Eq. (9.124) to calculate the field 
strength tensor. This calculation yields 


pe Hod! d ee 


py ae o 
usR 


(10.18) 
An u,R’ dt 


Now using Eq. (9.125) to identify the elements of this tensor with the field components, we may bring 
the result to the following vector form: ® 


4 n-B nx {(n-B)xB} ficag: 
4né,|y°(1-B-m)R? = (1-B-n)’cR | 
Fole! fo ne (10.20) 


Cc Lo 


Thus the magnetic and electric fields of a relativistic particle are always proportional and 
perpendicular to each other, and related just as in a plane wave — cf. Eq. (7.6), with the difference that 
now the vector Nye may be a function of time. Superficially, this result contradicts the electro- and 
magnetostatics, because for a particle at rest, B should vanish while E stays finite. However, note that 
according to the Coulomb law for a point charge, in this case E = En, so that Bo nyexXE © MretXMyet = 
0. (Actually, in these relations, the subscript “ret” is unnecessary.) 


As a sanity check, let us use Eq. (19) as an alternative way to find the electric field of a charge 
moving without acceleration, i.e. uniformly, along a straight line — see Fig. 9.1la reproduced, with 
minor changes, in Fig. 3. (This calculation will also illustrate the technical challenges of practical 
applications of the Liénard-Wiechert formulas for even simple cases.) In this case, the vector B does not 
change in time, so that the second term in Eq. (19) vanishes, and all we need to do is to spell out the 
Cartesian components of the first term. 


6 An alternative way of deriving these formulas (highly recommended to the reader as an exercise) is to plug Eqs. 
(10) into the general relations (9.121), and carry out the required temporal and spatial differentiations directly, 
using Eq. (16) and its spatial counterpart (which may be derived absolutely similarly): 


n 
a“ acral 
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Fig. 10.3. The linearly moving charge 
problem. 


ret u(t os trot ) 


Let us select the coordinate axes and the time origin as shown in Fig. 3, and make a clear 
distinction between the actual position, r’ (¢t) = {ut, 0, 0} of the charged particle at the instant ¢ we are 
considering, and its position r (tet) at the retarded instant defined by Eq. (5), i.e. the moment when the 
particle’s field had to be radiated to reach the observation point r at the given time ¢, propagating with 
the speed of light. In these coordinates 


p={6,0,0} r={0,b,0) r(,,) ={ut,,,0,0 mn, ={cosd, sind, 0}, (10.21) 


with cosO= —utrer/Rret, SO that [(n — B)xJret = —Utrer/Rret — 2, and Eq. (19) yields, in particular: 
q a4 UE ot [Re 7 B q ae UE ot 7 PR.o 


E, = = 10.22 
‘Ane, 7 [d-B ny R?),,  476 77[0-B ny R’ |, nol 


But according to Eq. (5), the product GR, may be represented as (c(t — tret) = u(t — tet). Plugging 
this expression into Eq. (22), we may eliminate the explicit dependence of EF, on time fret: 


_ 4g —ut 
© Ane °[0-B mR], 
The only non-zero transverse component of the field also has a similar form: 
seg sin@ sai q b = (10.24) 
4né,|y°(I-B-n)R? | 4ze, 7? [(1-B-n)R| 


ret 
while E, = 0. From Fig. 3, B — Met = Geos? = —Putret/Rret, So that (1 — B-n)Rret = Rret + Hutret , and we may 
again use Eq. (5) to get (1 — B-m)Rret = c(t — trer) + Butret = Ct — Ctrey/ 7. What remains is to calculate fret 
from the self-consistency equation (5), whose square in our current case (Fig. 3) takes the form 


(10.23) 


y 


R2,=b? +(ut,.) =C’ (t-te) - (10.25) 


ret 


This is a simple quadratic equation for fret, which (with the appropriate negative sign before the square 
root, to get tre: < f) yields: 
/2 


t= tr) -( Hh ic?)| = rt-LWwye apy (10.26) 
Cc 
so that the only retarded-function combination that participates in Eqs. (23)-(24) is 
1/2 


[a-B-n)R],, = S077? +b)”, (10.27) 
y 


and, finally, the electric field components are 
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q yut 5p -—4 p 
ane, (typ Ane Bayer) 


E.=0. (10.28) 


But these are exactly Eqs. (9.139),” which had been obtained in Sec. 9.5 by much simpler means, 
without the necessity to solve the self-consistency equation (5). However, that alternative approach was 
essentially based on the inertial motion of the particle, and cannot be used in problems in which it 
moves with acceleration. In such problems, the second term in Eq. (19), dropping with distance more 
slowly, as 1/Ryet, and hence describing wave radiation, is frequently the most important one. 


10.2. Radiation power 


Let us calculate the angular distribution of the particle’s radiation. For that, we need to return to 
Eqs. (19)-(20) to find the Poynting vector S = ExH, and in particular, its radial component S, = S-nyet, at 
large distances R from the particle. Following tradition,*® let us express the result as the energy radiated 
into unit solid angle per unit time interval dt,4q of the radiation, rather than that (dt) of its measurement. 
(We will need to return to the measurement time ¢ in the next section to calculate the observed radiation 
spectrum.) Using Eq. (16), we get 


dP dé = (R? Ot 


8 je Tp2 ip: 
dQ dQdt,, BE (Ex H)-[R’n (1-B-n) 


(10.29) 


ret” 


At sufficiently large distances from the particle, i.e. in the limit Re —> © (in the radiation zone), the 
contribution of the first (essentially, the Coulomb-field) term in the square brackets of Eq. (19) vanishes 
as 1/R°, and the substitution of the remaining term into Eqs. (20) and then Eq. (29) yields the following 
formula, which is valid for an arbitrary law of the particle’s motion:° 


ip Zag? |nx|(n-B)xB]) 


(10.30) 


a (az (tn) 


Now, let us apply this important result to some simple cases. First of all, Eq. (30) says that a 
charge moving with a constant velocity B does not radiate at all. This might be expected from our 
analysis of this case in Sec. 9.5 because in the reference frame moving with the charge it produces only 
the Coulomb electrostatic field, 1.e. no radiation. 


Next, let us consider a linear motion of a point charge with a non-zero acceleration directed 
along the straight line of the motion. In this case, with the coordinate axes selected as shown in Fig. 4a, 
each of the vectors involved in Eq. (30) has at most two non-zero Cartesian components: 


7 A similar calculation of magnetic field components from Eq. (20) gives results identical to Eqs. (9.140). 

8 This tradition may be reasonably justified. Indeed, we may say that the radiation field “detaches” from the 
particle at times close to tye, while the observation time ¢ depends on the detector’s position, and hence is less 
relevant for the radiation process as such. 

° If the direction of radiation, n, does not change in time, this formula does not depend on the observer’s position 
R. Hence, from this point on, the index “ret” may be safely dropped for brevity, though we should always 
remember that B in Eq. (30) is the reduced velocity of the particle at the instant of the radiation’s emission, not of 
its observation. 
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n={sin9,0,coso', B={0,0,8}, p= 0,0, At, (10.31) 


where @ is the angle between the directions of the particle’s motion and of the radiation’s propagation. 
Plugging these expressions into Eq. (30) and performing the vector multiplications, we readily get 


AP | ZG 25 en 0 
dQ (42) (1— Bcos@)" 


Figure 4b shows the angular distribution of such radiation, for three values of the particle’s speed wu. 


(10.32) 


(b) 


Fig. 10.4. Particle’s radiation 


6 Zz B<<l OS) at its linear acceleration: (a) 
the problem’s geometry, and 
(b) the last fraction of Eq. (32) 

B=0.5 asa function of the angle @. 


If the speed is relatively low (u << c, i.e. B<< 1), the denominator in Eq. (32) is very close to 1 
for all observation angles @, so that the angular distribution of the radiation power is close to sin’ — 
just as it follows from the general non-relativistic Larmor formula (8.26), for our current case with © = 
@. However, as the velocity is increased, the denominator becomes less than 1 for 0 < 7/2, i.e. for the 
forward-looking directions, and larger than 1 for back directions. As a result, the radiation in the 
direction of the particle’s motion is increased (somewhat counter-intuitively, regardless of the 
acceleration’s sign!), while that in the back direction is suppressed. For ultra-relativistic particles (2 > 
1), this trend is strongly exacerbated, and radiation to very small forward angles dominates. To describe 
this main part of the angular distribution, we may expand the trigonometric functions of @ participating 
in Eq. (32) in the Taylor series in small 6, and keep only their leading terms: sind ~@, cos ~ 1 — 07/2, 
so that (1 — Bcos6) = (1 + 70°)/2/. The resulting expression, 


dP 2Zyq° B 8 (voy 
dQ” (1+ 770?) 


describes a narrow “hollow cone” distribution of radiation, with its maximum at the angle 


for y>>1, (10.33) 


0, ae (10.34) 
2y 


Another important aspect of Eq. (33) is how extremely fast (as 7*) the radiation density grows with the 
Lorentz factor y, i.e. with the particle’s energy €= ymc’. 


Still, the total radiated power / (into all observation angles) at linear acceleration is not too high 
for any practicable values of parameters. To show this, let us first calculate Y for an arbitrary motion of 
the particle. To start, let me demonstrate how Y may be found (or rather guessed) from the general 
relativistic arguments. In Sec. 8.2, we have derived Eq. (8.27) for the power of the electric dipole 
radiation for a non-relativistic particle motion. That result is valid, in particular, for one charged particle, 
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whose electric dipole moment’s derivative over time may be expressed as d(gr)/dt = (q/m)p, where p is 
the particle’s linear mechanical momentum (not its electric dipole moment). As the result, the Larmor 
formula (8.27) in free space, 1.e. with v = c (but u << c) reduces to 


2 2 
_% (2a) oe [+2 for u<<e. (10.35) 


6c? \m dt 6mm’*c*\ dt dt 


This is evidently not a Lorentz-invariant result, but it gives a clear hint of how such an invariant, that 
would be reduced to Eq. (35) in the non-relativistic limit, may be formed: 


p 204 _( Pa ap* \__ Zo9" (2) (4) (10.36) 
6am?c?\ dr dt} 6nm?c?|\dr c\ dr 


Using the relativistic expressions p = ymcB, &= ymc’, and dt = dt/y, the last formula may be recast into 
the so-called Liénard extension of the Larmor formula:!° 


- (Bx (6) |= — Za |(y Fry? 6). (10.37) 


It may be also obtained by direct integration of Eq. (30) over the full solid angle, thus confirming our 
guess. 


However, for some applications, it is beneficial to express Y via the time evolution of the 
particle’s momentum alone. For that, we may differentiate the fundamental relativistic relation (9.78), 
&?=(mc’y + (pc)’, over the proper time 7 to get 


a © 2 
go ig MCR Oe (10.38) 
dt dt dt € dt de’ 


where the last step used the relativistic relation c’p/é = u mentioned in Sec. 9.3. Plugging Eq. (38) into 


p= 24. (2) e(2) 
6mm’ c? |\dr dt 


Eq. (36), we may rewrite it as 


(10.39) 


Please note the difference between the squared derivatives in this expression: in the first of them we 
have to differentiate the momentum’s vector p first, and only then form a scalar by squaring the 
resulting vector derivative, while in the second case, only the magnitude of the vector has to be 
differentiated. For example, for circular motion with a constant speed (to be analyzed in detail in the 
next section), the second term vanishes, while the first one does not. 


However, if we return to the simplest case of linear acceleration (Fig. 4), then (dp/dty° = 
(dp/dey, and Eq. (39) is reduced to 


10 The second form of Eq. (10.37), which is frequently more convenient for applications, may be readily obtained 
from the first one by applying MA Eq. (7.7a) to the vector product. 
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Zo" (2) (-#")- Zo" (#) 5 204" (2) (10.40) 


~ 6am?c? dt ~— 6am?c?\ dt) y? 6mm? c? \ dt 


i.e. formally coincides with the non-relativistic relation (35). To get a better feeling of the magnitude of 
this radiation, we may combine Eq. (9.144) with B = 0, and Eq. (9.148) with E || u to get dp/dtret = 
dé/dz’, where z’ is the particle’s coordinate at the moment f;<. The last relation allows us to rewrite Eq. 
(40) in the following form: 


Zq? (dé) Z.q? dé dé dt Z)q? dé dé 
= —s = —< = s ; (10.41) 
6am co \ dz 6am°c° dz' dt, dz’  6am°c'u dz' dt, 
For the most important case of ultra-relativistic motion (u — c), this result reduces to 
yf © 2 
PP _ 2d(E/mc’) (10.42) 


dé/dt,, 3 d(z'/r.) ’ 


where r, is the classical radius of the particle, defined by Eq. (8.41). This formula shows that the 
radiated power, 1.e. the change of the particle’s energy due to radiation, is much smaller than that due to 
the accelerating field unless energy as large as ~mc’ is gained on the classical radius of the particle. For 
example, for an electron, with 7, = 3x10°'° m and mc” = mc” x 0.5 MeV, such an acceleration would 
require the accelerating electric field of the order of (0.5 MV)/(3x10"° m) ~ 10'* MV/m, while 
practicable accelerating fields are below 10° MV/m — limited by the electric breakdown effects. (As 
described by the factor m* in the denominator of Eq. (41), for heavier particles such as protons, the 
relative losses are even lower.) Such negligible radiative losses of energy are actually a large advantage 
of linear accelerators — such as the famous two-mile-long SLAC,!! which can accelerate electrons or 
positrons to energies up to 50 GeV, i.e. to vx 10°. If obtaining radiation from the accelerated particles is 
the goal, it may be readily achieved by bending their trajectories using additional magnetic fields — see 
the next section. 


10.3. Synchrotron radiation 


Now let us consider a charged particle being accelerated in the direction perpendicular to its 
velocity u (for example by the magnetic component of the Lorentz force), so that its speed u, and hence 
the magnitude p of its momentum, do not change. In this case, the second term in the square brackets of 


Eq. (39) vanishes, and it yields 
2 
Zoq° (#2) Log dp 2 
= = 10.43 

6mm’? c* \ dt 62m’ c? \ dt. i ( 


Comparing this expression with Eq. (40), we see that for the same acceleration magnitude, the 
electromagnetic radiation is a factor of 7 larger. For modern accelerators, with y~ 10*-10°, such a factor 
creates an enormous difference. For example, if a particle is on a cyclotron orbit in a constant magnetic 
field (as was analyzed in Sec. 9.6), both u and p = ymu obey Eq. (9.150), so that 


!] See, e.g., https://www6.slac.stanford.edu/. 
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(10.44) 


(10.45) 


Note that for ultrarelativistic particles (6 ~ 1), the power grows as /, i.e. as the square of the 
particle’s energy & « y. For example, for typical parameters of the first electron cyclotrons (such as the 
General Electric’s machine in which the synchrotron radiation was first noticed in 1947), R ~ 1 m, &~ 
0.3 GeV (v~ 600), Eq. (45) gives a very modest electron energy loss per one revolution: P7= A(22R/u) 


= 27SR/c ~ | keV. However, already by the mid-1970s, electron accelerators, with R ~ 100 m, could 
give each particle energy & ~10 GeV, and the energy loss per revolution grew to ~ 10 MeV, becoming 
the major energy loss mechanism. For proton accelerators, such energy loss is much less of a problem, 
because the y of an ultra-relativistic particle (at fixed ¢) is proportional to 1/m, so that the estimates, at 
the same R, should be scaled back by (m,/me)* ~ 10'°. Nevertheless, in the giant modern accelerators 
such as the LHC (with R ~ 4.3 km and € up to 7 TeV), the synchrotron radiation loss per revolution is 
rather noticeable (P7 ~ 6 keV), leading not as much to particle deceleration as to a substantial 
photoelectron emission from the beam tube’s walls, creating harmful defocusing effects. 


However, what is bad for particle accelerators and storage rings is good for the so-called 
synchrotron light sources — the electron accelerators designed specially for the generation of intensive 
synchrotron radiation — with the spectrum extending well beyond the visible light range. Let us analyze 
the angular and spectral distributions of such radiation. To calculate the angular distribution, let us select 
the coordinate axes as shown in Fig. 5, with the origin at the current location of the orbiting particle, the 
z-axis directed along its instant velocity (i.e. the vector B), and the x-axis, toward the orbit’s center. 


Fig. 10.5. The synchrotron 
radiation problem’s geometry. 


In the general case, when the unit vector n toward the radiation’s observer is not within any of 
the coordinate planes, it has to be described by two angles — the polar angle @ and the azimuthal angle g 
between the x-axis and the projection OP of the vector n onto the [x, y]-plane. Since the length of the 
segment OP is sin@, the Cartesian components of the relevant vectors are as follows: 


n= {sin 8cos ~, sin @sin g, cos 6}, p= {0, 0, Bh, and i = \B. 0, Of. (10.46) 
Plugging these expressions into the general Eq. (30), we get 
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AP _ An 
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- 12 6 
\B| 7°S(.9), where 
_ i) __ sin’ Ocos* p ay 
~ 8y°(1— Bcos@)>| v7 (1— Bcos 8)? |’ 


According to this result, just as at the linear acceleration, in the ultra-relativistic limit, most 
radiation goes into a narrow cone (of a width A@~ ’' << 1) around the vector B, i.e. around the instant 
direction of the particle’s propagation. For such small angles, and y>> 1, 


1 4v*@° cos’ 
0,0) » ————_~ | | - ———— |. 10.48 
f( 9) ar (l+7°0°)? ( ) 


f(8.9) 


The left panel of Fig. 6 shows a color-coded contour map of this angular distribution f(@, @), as observed 
on a distant plane normal to the particle’s instant velocity (in Fig. 5, parallel to the [x, y]-plane), while 
its right panel shows the factor fas a function of 0 in two perpendicular directions: within the particle’s 
rotation plane (in the direction parallel to the x-axis, i.e. at g = 0) and perpendicular to this plane (along 
the y-axis, i.e. at @ = +7/2). The result shows, first of all, that, in contrast to the case of linear 
acceleration, the narrow radiation cone is now not hollow: the intensity maximum is reached at 0 = 0, 
i.e. exactly in the direction of the particle’s motion direction. Second, the radiation cone is not axially 
symmetric: within the particle rotation plane, the intensity drops faster (and even has nodes at 8 = +1/7). 


15 
0.8 
yO sin 0.4 
e (9.9) 
0. off-plane 
(g = 72) 
0.3 
“15 


A q A 0 0.5 1 15 79 2 
vO COS P 


Fig. 10.6. The angular distribution of the synchrotron radiation at y >> 1. 


The angular distribution (47) of the synchrotron radiation was calculated for the (inertial) 
reference frame whose origin coincides with the particle’s position at this particular instant, 1.e. its 
radiation pattern is time-independent in the frame moving with the particle. This pattern enables a semi- 
quantitative description of the radiation by an ultra-relativistic particle from the point of view of a 
stationary observer: if the observation point is on (or very close to) the rotation plane,!? it is being 


!2 Tt is easy (and hence is left for the reader’s exercise) to show that if the observation point is much off-plane 
(say, is located on the particle orbit’s axis), the radiation is virtually monochromatic, with frequency @,. (As we 
know from Sec. 8.2, in the non-relativistic limit u << c, this is true for any observation point.) 
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“struck” by the narrow radiation cone once each rotation period 7 27R/c, each “strike” giving a field 
pulse of a short duration Atret << 1/@ — see Fig. 7.!3 


(a) 


r'(t,) r'(t) B(t, ) 


At~7/y?> 
At~Tly 


— 


Fig. 10.7. (a) The synchrotron radiation cones (at y >> 1) for two close values of t,t, and (b) the in-plane 
component of the electric field observed in the rotation plane, as a function of time ¢ — schematically. 


The evaluation of the time duration At of each pulse requires some care: its estimate Afr ~ 1/vae 
is correct for the duration of the retarded time interval during which its cone is aimed at the observer. 
However, due to the time compression effect discussed in detail in Sec. 1 and described by Eq. (16), the 
pulse duration as seen by the observer is a factor of 1/(1 — f) shorter, so that 


At = (1— B)At,... ~ pg ees (aa ae for y>>1. (10.49) 


From the Fourier theorem, we can expect the frequency spectrum of such radiation to consist of 
numerous (N ~ y° >> 1) harmonics of the particle rotation frequency @, with comparable amplitudes. 
However, if the orbital frequency fluctuates even slightly (S@/@, > 1/N ~ 1/7’), as it happens in most 
practical systems, the radiation pulses are not coherent, so that the average radiation power spectrum 
may be calculated as that of one pulse, multiplied by the number of pulses per second. In this case, the 
spectrum is continuous, extending from low frequencies all the way to approximately 


On ~1/ At ~ 7° Q,. (10.50) 


In order to verify and quantify this result, let us calculate the spectrum of radiation due to a 
single pulse. For that, we should first make the general notion of the radiation spectrum quantitative. Let 
us represent an arbitrary electric field (say that of the synchrotron radiation we are studying now) 
observed at a fixed point r, as a function of the observation time t, as a Fourier integral:!4 


E(t)= [E,e*dt. (10.51) 


OD 


'3 The fact that the in-plane component of each electric field’s pulse E(t) is antisymmetric with respect to its 
central point, and hence vanishes at that point (as Fig. 7b shows), readily follows from Eq. (19). 

'4 Tn contrast to the single-frequency case (i.e. a monochromatic wave), we may avoid taking the real part of the 
complex function (E,e") by requiring that in Eq. (51), E.. = E,*. However, it is important to remember the 
factor 2 required for the transition to a monochromatic wave of frequency @ and with real amplitude Eo: E,, = Eo 
[Ho- @) + Kat @))/2. 
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This expression may be plugged into the formula for the total energy of the radiation pulse (i.e. of the 
loss of particle’s energy €) per unit solid angle:!5 


ee an Is. (1)R “ar F flee} dt. (10.52) 


This substitution, followed by a natural change of the integration order, yields 


2 +a 
= a [ee [do Ey -E,, [eer (10.53) 


—-O 


But the inner integral (over f) is just 270(@ + @’).'© This delta function kills one of the frequency 
integrals (say, one over @’), and Eq. (53) gives us a result that may be recast as 
dé _* 4nR? 4nR? 


mae [ie)do, — with I(@ oe Eo Eg = Eo .E, (10.54) 
0 0 0 


where the evident frequency symmetry of the scalar product E,,-E_,, has been utilized to fold the integral 
of [(@) to positive frequencies only. The first of Eqs. (54) makes the physical sense of the function [(@) 
very clear: this is the so-called spectral density of the electromagnetic radiation (per unit solid angle).!7 


To calculate the spectral density, we can express the function E, via E(t) using the Fourier 
transform reciprocal to Eq. (51): 


1 
E,=— | E()e! dt. (10.55) 
Oe ae 
In the particular case of radiation by a single point charge, we may use here the second (radiative) term 
of Eq. (19): 


E, tee] J dt, (10.56) 
22 4zé, cR°“| (1-B-n) ret 


Since the vectors n and £ are more natural functions of the radiation’s emission (retarded) time fret, let us 
use Eqs. (5) and (16) to exclude the observation time ¢ from this integral: 


nx{(n-B)xB}] ff, Rn 
Ey, = = on ale Seay i ex ot + : Jr (10.57) 


Assuming that the observer is sufficiently far from the particle,'’ we may treat the unit vector n as a 
constant and also use the approximation (8.19) to reduce Eq. (57) to 


!5 Note that the expression under this integral differs from d//dQ defined by Eq. (29) by the absence of the term 
(1 — B-n) = Ote/Ot — see Eq. (16). This is natural because now we are calculating the wave energy arriving at the 
observation point r during the time interval dt rather than dfyet. 

16 See, e.g. MA Eq. (14.4). 

'7 The notion of spectral density may be readily generalized to random processes — see, e.g., SM Sec. 5.4. 

'8 According to the estimate (49), for a synchrotron radiation’s pulse, this restriction requires the observer to be 
much farther than Ar’ ~ cAt ~ R// from the particle. With the values R ~ 10*m and y ~ 10° mentioned above, Ar’ 
~ 10"'' m, so that this requirement is satisfied for any realistic radiation detector. 
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_ 1 q 1 ior |"? nx {(n—B) xp} ; [2] 
Ba = 2m 476, cR exp c Hf] (1—B-n)’ expo ’ Cc oe ; (10.58) 


—o0| 


Plugging this expression into Eq. (54), and then using the definitions c = 1/(gtm)'” and Zo = (u0/&)"”, 


we get!9 
2a f) e (n—B)x Blexp{ifr-**)} At op 
c 


ret 


10.59 
re ( ) 


This result may be further simplified by noticing that the fraction before the exponent may be 
represented as a full derivative over fret, 


extn Bs =| GBB ee (10.60) 
(I-B-n)’ J, (1—B-n) a LIB I, 


and working out the resulting integral by parts. At this operation, the time differentiation of the 
parentheses in the exponent gives d[tre: — n-V(tret)/C]/dtret = (1 — n-U/C)ret = (1 — B-M)ret, leading to the 
cancellation of the remaining factor in the denominator and hence to a very simple general result: 2° 


H(o)= 24 | nscarpyesn of 1") dt,..| - (10.61) 


—0 


Now returning to the particular case of the synchrotron radiation, it is beneficial to choose the 
origin of time fret So that at ter = 0, the angle O between the vectors n and B takes its smallest value , 
i.e., in terms of Fig. 5, the vector n is within the [y, z]-plane. Fixing this direction of the axes so that they 
do not move, we can redraw that figure as shown in Fig. 8. 


Fig. 10.8. Deriving the synchrotron radiation’s 
spectral density. The vector n is static within 
the [y, z]-plane, while the vectors r (t,,) and 
Bre rotate, within the [x, z]-plane, with the 
angular velocity @, of the particle. 


In this “lab” reference frame, the vector n does not depend on time, while the vectors r ’(te.) and 
Bret do depend on it via the angle @= @ebret: 


19 Note that for our current purposes of calculation of the spectral density of radiation by a single particle, the 
factor exp {iar/c} has got canceled. However, as we have seen in Chapter 8, this factor plays a central role in the 
interference of radiation from several (many) sources. Such interference is important, in particular, in undulators 
and free-electron lasers — the devices to be (qualitatively) discussed below. 

20 Actually, this simplification is not occasional. According to Eq. (10b), the expression under the derivative in 
the last form of Eq. (60) is just the transverse component of the vector potential A (give or take a constant factor), 
and from the discussion in Sec. 8.2 we know that this component determines the electric dipole radiation of a 
system, which dominates the radiation in our current case of a single particle with a non-zero electric charge. 
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n={0,sin9,,cos0,}, r’(t,,,)={R(l-cosa),0, Rsina}, B,,, ={Gsina,0,Bcosa}. (10.62) 
Now an easy multiplication yields 


[nx(nxB)|., = Bisin a, sin@, cos@, cosa, — sin’ 8, sin a}, (10.63) 


esol iol ad I = oP _ oe @, sin «|| . (10.64) 
é 6 


As we already know, in the (most interesting) ultra-relativistic limit y>> 1, most radiation is confined to 
short pulses, so that only small angles a ~ @Ate: ~ y | may contribute to the integral in Eq. (61). 
Moreover, since most radiation goes to small angles 9 ~ @ ~ y ', it makes sense to consider only such 
small angles. Expanding both trigonometric functions of these small angles, participating in parentheses 
of Eq. (64), into the Taylor series, and keeping only the leading terms, we get 


R ; R RO, Ro: 
te ~—COSO, sina xt,,, -—@,t + eat +——t 


ret ret QO, ret c’ ret ret ° 
Cc 2 c 6 


(10.65) 


Since (R/c)@, = u/c = 2 = 1, in the two last terms we may approximate this parameter by 1. However, it 
is crucial to distinguish the difference between the two first terms, proportional to (1 — A)tret, from zero; 
as we have done before, we may approximate it with f,./27. On the right-hand side of Eq. (63), which 
does not have such a critical difference, we may be bolder, taking?! 


Bisina, sin 0, cos 6, cosa, —sin’ 0, sina}= {a, 6, 0\={o Otros ys o} (10.66) 
As aresult, Eq. (61) is reduced to 
Zage Lege 2 ‘| 
I(@)=—2 =e a.| +la.} |, 10.67 

( ) 167° 16x? \ * | | ( ) 

where a, and a, are the following dimensionless factors: 

oO” 
ee = = ofa. ret oof 2 [ + yo tree + = és) 

a (10.68) 


3, ie 
a,, =O [% oo '2 [ os yer Voges 3 Tet . ? 


that describe the frequency spectra of two components of the synchrotron radiation, with mutually 
perpendicular polarization planes. Defining the following dimensionless parameter 


= apr ee ae (10.69) 


?! This expression confirms that the in-plane (x) component of the electric field is an odd function of t,., and hence 
of t — f (see its sketch in Fig. 7b), while the normal (v) component is an even function of this difference. Also, 
note that for an observer exactly in the rotation plane (@ = 0) the latter component equals zero for all times — the 
fact which could be predicted from the very beginning because of the evident mirror symmetry of the problem 
with respect to the particle’s rotation plane. 
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which is proportional to the observation frequency, and changing the integration variable to € = 


Wetret!(A” + yy" *| the integrals (68) may be reduced to the modified Bessel functions of the second kind, 
but with fractional indices: 


a= —(6 + a ifesso| Sof + e Jus = PEL ae 


= (@ ie 


+00 3 
a,= ~6,(65 +y> yr fosn|Sn(e+t J = 28 8 K,,) 


c —3o 


Cc 


(10.70) 


Figure 9a shows the dependence of the Bessel factors defining the amplitudes a, and a, on the 
normalized observation frequency v. It shows that the radiation intensity changes with frequency 
relatively slowly (note the log-log scale of the plot!) until the normalized frequency defined by Eq. (69) 
is increased beyond ~1. For the most important observation angles @ ~ y, this means that our estimate 
(50) is indeed correct, though formally the frequency spectrum extends to infinity.?2 


(a) (b) 


5fK,.(dé 
g 


0.1 


5.01 0.1 1 10 0.01 0.1 1 10 


v S 
Fig. 10.9. The frequency spectra of: (a) two components of the synchrotron radiation, at a fixed angle , and 
(b) its total (polarization- and angle-averaged) intensity. 


Naturally, the spectral density integrated over the full solid angle exhibits a similar frequency 
behavior. Without performing the integration,” let me just give the result (also valid for y>> 1 only) for 
the reader’s reference: 

V3 o 


f1(o)do = 3 9241 Ko ( QUE, where ¢ = 2 as (10.71) 
ie An ° 30.7 


Figure 9b shows the dependence of this integral on the normalized frequency ¢: (This plot is sometimes 
called the “universal flux curve”.) In accordance with the estimate (50), it reaches the maximum at 


22 The law of the spectral density decrease at large v may be readily obtained from the second of Eqs. (2.158), 
which is valid even for any (even non-integer) Bessel function index n: a, « ay « V exp{-v}. Here the 
exponential factor is certainly the most important one. 

23 For that, and many other details, the interested reader may be referred, for example, to the fundamental review 
collection by E. Koch et al. (eds.) Handbook on Synchrotron Radiation (in 5 vols.), North-Holland, 1983-1991, or 


to a more concise monograph by A. Hofmann, The Physics of Synchrotron Radiation, Cambridge U. Press, 2007. 
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#0.3, 1.€.@nax * 

aa 2 

For example, in the National Synchrotron Light Source (NSLS-II) in the Brookhaven National 
Laboratory near our SBU campus, with its ring’s circumference of 792 m, the electron revolution period 
T is 2.64 us. With @, = 22/7 = 2.4x10° s", for the achieved y ~ 6x10° (€ = 3 GeV), we get @nax ~ 
3x10!’ s", ie. the photon energy i@nax ~ 200 eV corresponding to soft X-rays. In light of this estimate, 


the reader may be surprised by Fig. 10, which shows the calculated spectra of the radiation that this 
facility was designed to produce, with the intensity maxima at photon energies up to a few keV. 


ae (10.72) 
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Fig. 10.10. Design brightness of various synchrotron radiation sources of the NSLS-II facility. 
For the bend magnets and wigglers, the “brightness” may be obtained by multiplication of the 
one-pulse spectral density [(@) calculated above, by the number of electrons passing the source 
per second. (Note the non-SI units used by the synchrotron radiation community.) However, for 
undulators, there is an additional factor due to the partial coherence of radiation — see below. 
(Adapted from the document NSLS-IT Source Properties and Floor Layout that was available 
online at https://www.bnl.gov/ps/docs/pdf/SourceProperties.pdf in 2011-2020.) 


The reason for this discrepancy is that in the NLLS-I, and in all modern synchrotron light 
sources, most radiation is produced not by the circular orbit itself (which is, by the way, not exactly 
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circular, but consists of a series of straight and bend-magnet sections), but by such bend sections, and 
the devices called wigglers and undulators: strings of several strong magnets with alternating field 
direction (Fig. 11), that induce periodic bending (wiggling”) of the electron’s trajectory, with the 
synchrotron radiation emitted at each bend. 


electrons 


Fig. 10.11. The generic structure of the wigglers, 
undulators, and free-electron lasers. (Adapted from 


http://www.xfel.eu/overview/how_does_it_work/. ) 


The difference between the wigglers and the undulators is more quantitative than qualitative: the 
former devices have a larger spatial period 1, (the distance between the adjacent magnets of the same 
polarity, see Fig. 11), giving enough space for the electron beam to bend by an angle larger than y", ice. 
larger than the radiation cone’s width. As a result, the radiation reaches an in-plane observer as a 
periodic sequence of individual pulses — see Fig. 12a. 


~ 2 


Fig. 10.12. Waveforms of the radiation emitted by 
(a) a wiggler and (b) an undulator — schematically. 


The shape of each pulse, and hence its frequency spectrum, are essentially similar to those 
discussed above,”* but with much higher local values of @ and hence @max — see Fig. 10. Another 
difference is a much higher frequency of the pulses. Indeed, the fundamental Eq. (16) allows us to 
calculate the time distance between them, for the observer, as 
a  . A 


) <<—, 10.73 
u 2y’ ¢ Cc ( ) 


u 


Ot 
At » ——At,, x (1 
Ot vt ret ( B 


24 Indeed, the period A, is typically a few centimeters (see the numbers in Fig. 10), i.e. is much larger than the 
interval Ar’ ~ R// estimated above. Hence the synchrotron radiation results may be applied locally, to each 
electron beam’s bend. (In this context, a simple problem for the reader: use Eqs. (19) and (63) to explain the 
difference between shapes of the in-plane electric field pulses emitted at opposite magnetic poles of the wiggler, 
which is schematically shown in Fig. 12a.) 
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where the first two relations are valid at 4, << R (the relation typically satisfied very well, see the 
numbers in Fig. 10), and the last two relations assume the ultra-relativistic limit. As a result, the 
radiation intensity, which is proportional to the number of the poles, is much higher than that from the 
bend magnets — see Fig. 10 again. 


The situation is different in undulators — similar structures with a smaller spatial period A,, in 
which the electron’s velocity vector oscillates with an angular amplitude smaller than y~'. As a result, 
the radiation pulses overlap (Fig. 12b), and the radiation waveform is closer to the sinusoidal one. As a 
result, the radiation spectrum narrows to the central frequency?> 

ett Soo? 27 ¢ . 


oO, =— 
oA vi 


u 


(10.74) 


For example, for the LSNL-II undulators with 2, = 2 cm, this formula predicts a radiation peak at 
phonon energy iw ~ 4 keV, in reasonable agreement with the quantitative calculation results shown in 
Fig. 10.26 Due to the spectrum narrowing, the undulator’s radiation intensity is higher than that of 
wigglers using the same electron beam. 


This spectrum-narrowing trend is brought to its logical conclusion in the so-called free-electron 
lasers?’ whose basic structure is the same as that of wigglers and undulators (Fig. 11), but the radiation 
at each beam bend is so intense and narrow-focused that it affects the electron motion downstream of the 
radiation cone. As a result, the radiation spectrum narrows around the central frequency (74), and its 
power grows as a square of the number N of electrons in the structure (rather than proportionately to NV 
in wigglers and undulators). 


Finally, note that wigglers, undulators, and free-electron lasers may be also used at the end of a 
linear electron accelerator (such as SLAC) which, as was noted above, may provide extremely high 
values of y and hence radiation frequencies, due to the smallness of radiation energy losses at the 
electron acceleration stage. Very unfortunately, I do not have time/space to discuss the (very interesting) 
physics of these devices in more detail.28 


10.4. Bremsstrahlung and Coulomb losses 


Surprisingly, a very similar mechanism of radiation by charged particles works on a much 
smaller spatial scale, namely at their scattering by charged particles of the propagation medium. This 


25 This important formula may be also derived in the following way. Due to the relativistic length contraction 
(9.20), the undulator structure period as perceived by beam electrons is 1’ = 1,/y, so that the central frequency of 
the radiation in the reference frame moving with the electrons is @’ = 2ac/A’ = 2acy/d,. For the lab-frame 
observer, this frequency is Doppler-upshifted in accordance with Eq. (9.44): @ = @ [(1 + B/(1 — BP] = 2yvay’, 
giving the same result as Eq. (74). 

26 Some of the difference is due to the fact that those plots show the spectral density of the number of photons n= 

é/ho@ per second, which peaks at a frequency below that of the density of power, 1.e. of the energy & per second. 

27 This name is somewhat misleading, because in contrast to the usual (“quantum”’) lasers, a free-electron laser is 
essentially a classical device, and the dynamics of electrons in it is very similar to that in vacuum-tube microwave 
generators, such as the magnetrons briefly discussed in Sec. 9.6. 

28 The interested reader may be referred, for example, to either P. Luchini and H. Motz, Undulators and Free- 
electron Lasers, Oxford U. Press, 1990; or E. Salin et al., The Physics of Free Electron Lasers, Springer, 2000. 
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effect, traditionally called by its German name bremsstrahlung (“brake radiation’), is responsible, in 
particular, for the continuous part of the frequency spectrum of the radiation produced in standard 
vacuum X-ray tubes, at the electron collisions with a metallic “‘anticathode”’. 29 


The bremsstrahlung in condensed matter is generally a rather complicated phenomenon because 
of the simultaneous involvement of many particles, and (frequently) some quantum electrodynamic 
effects. This is why I will give only a very brief glimpse at the theoretical description of this effect, for 
the simplest case when the scattering of incoming, relatively light charged particles (such as electrons, 
protons, a-particles, etc.) is produced by atomic nuclei, which remain virtually immobile during the 
scattering event (Fig. 13a). This is a reasonable approximation if the energy of incoming particles is not 
too low; otherwise, most scattering is produced by atomic electrons whose dynamics is substantially 
quantum — see below. 


(b) Fig. 10.13. The basic 


geometry of the 


Pin bremsstrahlung and the 
a g Coulomb loss problems 
ae) in the (a) direct and (b) 


Pini reciprocal spaces. 


To calculate the frequency spectrum of the radiation emitted during a single scattering event, it is 
convenient to use a byproduct of the last section’s analysis, namely Eq. (59) with the replacement (60):2° 


1 gq || d mx(nxB) ner’ 
1(o) i|4 1-B-n exfia(e-2= Hag 


An’c 476, 
A typical duration r of a single scattering event we are discussing is of the order of t= ao/c ~ (10°'° 
m)/(3x10* m/s) ~ 10°'® s in solids, and only an order of magnitude longer in gases at ambient conditions. 
This is why for most frequencies of interest, from zero all the way up to at least soft X-rays,3! we can 
use the so-called low-frequency approximation, taking the exponent in Eq. (75) for 1 through the whole 
collision event, i.e. the integration interval. This approximation immediately yields 


(10.75) 


Brems- 2 2 
at 1(w) = 1 q |nx (nx Bin) nx (nx B...:) (10.76) 
Rieke An*c Are, 1 2 B iin Nn 1 = Bini n ) 


2° Such X-ray radiation had been first observed experimentally (though not correctly interpreted) by N. Tesla in 
1887, i.e. before it was rediscovered and studied in detail by W. Réntgen. 

30 In publications on this topic (whose development peak was in the 1920s-1930s), the Gaussian units are more 
common, and the uppercase letter Z is usually reserved for expressing charges as multiples of the fundamental 
charge e, rather than for the wave impedance. This is why, in order to avoid confusion and facilitate the 
comparison with other texts, in this section I (while still staying with the SI units used throughout my series) will 
use the fraction 1/éc, instead of its equivalent Zo, for the free-space wave impedance, and write the coefficients in 
a form that makes the transfer to the Gaussian units elementary: it is sufficient to replace all (qq 7/47&)s1 with 
(4q ’)Gaussian. In the (rare) cases when I spell out the charge values, I will use a different font: g = Je, q’ = 7’e. 

3! A more careful analysis shows that this approximation is actually quite reasonable up to much higher 
frequencies, of the order of 7/r. 
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In the non-relativistic limit (Ani, Ain << 1), this formula is reduced to the following result: 


1 q va 23 
T\o)= sin“ 0 10.77 
( ) Arc ATE, mc? ( ) 


(which may be derived from Eq. (8.27) as well), where g is the momentum transferred from the scattering 
center to the scattered charge (Fig. 13b):°2 


$ =Piin — Pini = MAU = mcAB = M(B sn —B,;); (10.78) 


and @ (not to be confused with the particle scattering angle 0’ shown in Fig. 13!) is the angle between 
the vector g and the direction n toward the observer — at the collision moment. 


The most important feature of the result (77)-(78) is the frequency-independent (“white’’) 
spectrum of the radiation, very typical for any rapid pulses that may be approximated as delta functions 
of time.33 (Note, however, that Eq. (77) implies a fixed value of g, so that the statistics of this parameter, 
to be discussed in a minute, may “color” the radiation.) 


Note also the “doughnut-shaped” angular distribution of the radiation, typical for non-relativistic 
systems, with the symmetry axis directed along the momentum transfer vector g. In particular, this 
means that in typical cases when | 0’ | << 1, 1.e. y<< p, when the vector g is nearly normal to the vector 
Pini (See, e.g., the example shown in Fig. 13b), the bremsstrahlung produces a significant radiation flow 
in the direction back to the particle source — the fact significant for the operation of X-ray tubes. 


Now integrating Eq. (77) over all wave propagation angles, just as we did for the instant 
radiation power in Sec. 8.2, we get the following spectral density of the particle energy loss, 


a 2 2 
oom f1(@)do = ae ee ay (10.79) 
do 4n 


3a Ane, mc? 
In most applications of the bremsstrahlung theory (as in most scattering problems‘), the impact 
parameter b (Fig. 13a), and hence the scattering angle 6’ and the transferred momentum g, have to be 


3? Please note the font-marked difference between this variable (g) and the particle’s electric charge (q). 

33 This is the basis, in particular, of the so-called High-Harmonic Generation (HHG) effect, discovered in 1977, 
which takes place at the irradiation of gases by intensive laser beams. The high electric field of the beam strips 
valent electrons from initially neutral atoms, and accelerates them away from the remaining ions, just to slam 
them back into the ions as the field’s polarity changes in time. The electrons change their momentum sharply 
during their recombination with the ions, resulting in bremsstrahlung-like radiation. The spectrum of radiation 
from each such event obeys Eq. (77), but since the ionization/acceleration/recombination cycles repeat 
periodically with the frequency @ of the laser field, the final spectrum consists of many equidistant lines, with 
frequencies n@. The classical theory of the bremsstrahlung does not give a cutoff @nax = Nmax@ of the spectrum; 
this limit is imposed by quantum mechanics: H@pax ~ &», Where the so-called ponderomotive energy ¢, = 
(eEy/@)’/4m, is the average kinetic energy given to a free electron by the periodic electric field of the laser beam, 
with amplitude Eo. In practice, Mma, may be as high as ~100, enabling alternative compact sources of X-ray 
radiation. For a detailed quantitative theory of this effect, see, for example, M. Lewenstein et al., Phys. Rev. A 49, 
2117 (1994). 

34 See, e.g., CM Sec. 3.5 and QM Sec. 3.3. 
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considered random. For elastic (fini = Arin = GZ) Coulomb collisions we can use the so-called Rutherford 
formula for the differential cross-section of scattering35 


2 2: 
OO) 0 ! — (10.80) 
dQ' \4ze, } \2pcB) sin* (6/2) 


Here do = 2bdb is the elementary area of the sample cross-section (as visible from the direction of the 
incident particles) corresponding to their scattering into an elementary body angle® 


dQ'=2rsind'd6"|. (10.81) 
Differentiating the geometric relation, which is evident from Fig. 13b, 
7=2psins, (10.82) 
we may represent Eq. (80) in a more convenient form 
2 
o-oo 22 | ae (10.83) 


Now combining Eqs. (79) and (83), we get 


q 2 ’ 2 
_dédo _16 q_ — a, (10.84) 
do dg 3 4mé,\4ae,mc” ) cB g 


This product is called the differential radiation cross-section. When integrated over all values of 
q (which is equivalent to averaging over all values of the impact parameter), it gives a convenient 
measure of the radiation intensity. Indeed, after the multiplication by the volume density n of 
independent scattering centers, such integral yields the particle’s energy loss per unit bandwidth of 
radiation per unit path length, -d’¢/dadx. A minor problem here is that the integral of 1/q¢ formally 
diverges at both infinite and zero values of g. However, these divergences are very weak (logarithmic), 
and the integral converges due to virtually any reason unaccounted for in our simple analysis. The 
standard (though slightly approximate) way to account for these effects is to write 


2 ¢ 2 ’ 7 Z 
_@é 16 gq GG} Np Se (10.85) 
dadx 3 4zé, CB” Genin 


2 
A7e,mc 


and then plug, instead of gmax and gmin, the scales of the most important effects limiting the range of the 
transferred momentum’s magnitude. In the classical-mechanics analysis, according to Eq. (82), ymax = 2p 
= 2mu. To estimate ymin, let us note that the very small momentum transfer takes place when the impact 
parameter b is very large, and hence the effective scattering time t ~ b/v is very long. Recalling the 
condition of the low-frequency approximation, we may associate ymin with 7 ~ 1/@ and hence with b ~ 


35 See, e.g., CM Eq. (3.73) with @ = gq’/4zé. In the form used in Eq. (80), the Rutherford formula is also valid 
for the small-angle scattering of relativistic particles, the criterion being | AB | << 2/ ¥. 

36 Again, the angle 6’ and the differential dQ’, describing the scattered particles (see Fig. 13) should not be 
confused with the parameters 0 and dQ describing the radiation emitted at the scattering event. 
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ut ~ v/a. Since for the small scattering angles, g is close to the impulse F’t ~ (qq /A4neb’)t of the 
Coulomb force, we get the estimate ymin ~ (Gq '/47€) a/u’, and Eq. (85) should be used with 


(10.86) 


3 ’ 
In 2™™ = In ain a7 |. 

Goin @ AE, 
This is Bohr’s formula for what is called the classical bremsstrahlung. We see that the low 
momentum cutoff indeed makes the spectrum slightly colored, with more energy going to lower 


frequencies. There is even a formal divergence at w — 0; however, this divergence is integrable, so it 
does not present a problem for finding the total energy radiative losses (-dé/dx) as an integral of Eq. (86) 


over all radiated frequencies w. A larger problem for this procedure is the upper integration limit, @ > 
oo, at which the integral diverges. This means that our approximate description, which considers the 
collision as an elastic process, becomes invalid and needs to be amended by taking into account the 
difference between the initial and final kinetic energies of the particle due to radiation of the energy 


quantum fiw of the emitted photon, so that 
Pin Ps Pin Ps 
oe _-" =ho, ie. —t=& —"=€-ho, . (10.87) 
2m 2m 2m 2m 


As a result, taking into account that the minimum and maximum values of ¢ correspond to, respectively, 
the parallel and antiparallel alignments of the vectors pini and prin, we get 


2 
aot f = 
=In (Pi Pin) =In (10.88) 
Ymin Pini ~ Prin (p2, ~ Prin /2m 


ti Pmax_ _ iy Pin + Pon 


Plugged into Eq. (85), this expression yields the so-called Bethe-Heitler formula for quantum 
bremsstrahlung.*’ Note that at this approach, gnax 1s close to that of the classical approximation, but Zin 


is of the order of ha@/u, so that 


Goin 


Yimin 


classical aF: F 
quantum B 


where 7 and %’ are the particles’ charges in the units of e, and @ is the dimensionless fine structure 
(“Sommerfeld’’) constant, 


(10.89) 


2 2 
e e 


a= — 
4re,hic si he 


_ @— <<, 10.90 
Gaussian 137 ( ) 
which is one of the basic notions of quantum mechanics.** Due to the smallness of the constant, the ratio 
(89) is below | for most cases of practical interest, and since the integral of (84) over ¢ is limited by the 
largest of all possible cutoffs gnin, it is the Bethe-Heitler formula that should be used. 


37 The modifications of this formula necessary for the relativistic description are surprisingly minor — see, e.g., 
Chapter 15 in J. Jackson, Classical Electrodynamics, 3" ed., Wiley 1999. For even more detail, the standard 
reference monograph on bremsstrahlung is W. Heitler, The Quantum Theory of Radiation, 3" ed., Oxford U. Press 
1954 (reprinted in 1984 and 2010 by Dover). 

38 See, e.g., QM Secs. 4.4, 6.3, 6.4, 9.3, 9.5, and 9.7. 
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Now nothing prevents us from calculating the total radiative losses of energy per unit length: 


g . 2 © 2 ’ yi Onax 1/2 _(@_ 1/2 
af = {|-S# |eo= 24 a > Ap (6 =ho) ay, (10.91) 
dx +\ dadz 3. 4meé,c\4aeymc’ ) Bs (ha) 


where A@max = € is the maximum energy of the radiation quantum. By introducing the dimensionless 
integration variable €= ha/é = 2h@(mu’/2), this integral is reduced to a table one,39 and we get 


dé 16 qq' ; lu?’ 16 g q° sl 
Be yl eee = (10.92) 
dx 3 Anec\4ae,ymc’) Bh 3. \4aehc \ 4ae, ) mc 
Following my usual style, at this point I would give you an estimate of the losses for a typical 
case; however, let me first discuss a parallel particle energy loss mechanism, the so-called Coulomb 
losses, due to the transfer of mechanical impulse from the scattered particle to the scattering centers. 


(This energy eventually goes into an increase of the thermal energy of the scattering medium, rather 
than to the electromagnetic radiation.) 


Using Eqs. (9.139) for the electric field of a linearly moving charge g, we can readily find the 
momentum it transfers to the counterpart charge q’:*° 


a wg 2 Gigs) 
A7é, BAC +y7u°t?) Are, bu 


Hence, the kinetic energy acquired by the scattering particle (and hence to the loss of the energy & of the 
incident particle) is 


2 
pe = bey _ 499" 2 (10.94) 
2m' Ane, ) m'u°b? 


Such elementary energy losses have to be summed up over all collisions, with random values of 
the impact parameter b. At the scattering center density n, the number of collisions per small path length 
dx per small range db is dN = n 2zbdb dx, so that 


Pues | (19.95) 


min 


Here, at the last step, the logarithmic integral over b was treated similarly to that over g in the 
bremsstrahlung theory. This approximation is adequate because the ratio Diax/bmin 18 much larger than 1. 
Indeed, bmin may be estimated from (Ap’)max ~ p = ymu. For this value, Eq. (93) with q’ ~ q gives bmin ~ 
re (see Eq. (8.41) and its discussion), which, for elementary particles, is of the order of 10°°m. On the 
other hand, for the most important case when the Coulomb energy absorbers are electrons (which, 
according to Eq. (94), are the most efficient ones, due to their very low mass m’), bmax may be estimated 
from the condition ct = b/yu ~ 1/@min, Where @nin ~ 10'° s is the characteristic frequency of electron 


39 See, e.g., MA Eq. (6.14). 
40 According to Eq. (9.139), E, =0, while the net impulse of the longitudinal force g’E, is zero, so that Eq. (93) 
gives the full momentum transfer. 
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transitions in atoms. (Quantum mechanics forbids such energy transfer at lower frequencies.) From here, 
we have the estimate bmax ~ /U/@min, SO that 


10.96 
b.. r.@ ( ) 


min ce min 


for y~ 1 and u ~c x 3x10° m/s giving bmax ~ 3x10° m, so that B ~ 10° (give or take a couple of orders 
of magnitude — this does not change the estimate InB ~ 20 too much). *! 


Now we can compare the non-radiative Coulomb losses (95) with the radiative losses due to the 
bremsstrahlung, given by Eq. (92): 
—dé) x A 
radiation ~a Peu mM B 2 1 : 
—dé m InB 


(10.97) 


Coulomb 


Since a ~ 10° << 1, for non-relativistic particles (<< 1) the bremsstrahlung losses of energy are much 
lower (that is why I did not want to rush with their estimates), and only for ultra-relativistic particles, the 
relation may be opposite. 


According to Eqs. (95)-(96), for electron-electron scattering (¢ = q’ =—e, m =m’ = ™m,),” at the 
value n =6x10°° m°® typical for air at ambient conditions, the characteristic length of energy loss, 


pie oe ., (10.98) 
°  (-dé /dx) 
for electrons with kinetic energy & = 6 keV is close to 2x10* m = 0.2 mm. (This is why we need high 
vacuum in electron microscope columns and other vacuum electron devices.) Since J, « &, more 
energetic particles penetrate to matter deeper, until the bremsstrahlung steps in, and limits this trend at 
very high energies. 


10.5. Density effects and the Cherenkov radiation 


For condensed matter, the Coulomb loss estimate made in the last section is not quite suitable, 
because it is based on the upper cutoff bmax ~ 7u/@min. For the example given above, the incoming 
electron velocity u is close to 5x10’ m/s, and for the typical value Onin ~ 10's! (A@nin ~ 10 eV), this 
cutoff bmax is of the order of ~5x10° m = 5 nm. Even for air at ambient conditions, this is somewhat 
larger than the average distance (~ 2 nm) between the molecules, so that at the high end of the impact 
parameter range, at b ~ bmax, the Coulomb loss events in adjacent molecules are not quite independent, 
and the theory needs some corrections. For condensed matter, with much higher particle density n, most 
collisions satisfy the following condition: 


41 A quantum analysis (carried out by Hans Bethe in 1940) replaces, in Eq. (95), InB with In(2//mu’/h(@)) — B’, 
where (@) is the average frequency of the atomic quantum transitions weight by their oscillator strength. This 
refinement does not change the estimate given below. Note that both the classical and quantum formulas describe 
a fast increase (as 1/) of the energy loss rate (-dé/dx) at y— 1, and its slow increase (as Iny) at y— ©, so that 
the losses have a minimum at (y — 1) ~ 1. 

42 Actually, the above analysis has neglected the change of momentum of the incident particle. This is legitimate 
at m’ << m, but for m = m’ the change approximately doubles the energy losses. Still, this does not change the 
order of magnitude of the estimate. 
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nb? >>1, (10.99) 


and the treatment of Coulomb collisions as a set of independent events is inadequate. However, this 
condition enables the opposite approach: treating the medium as a continuum. In the time-domain 
formulation used in the previous sections of this chapter, this would be a very complex problem, 
because it would require an explicit description of the medium dynamics. Here the frequency-domain 
approach, based on the Fourier transform in both time and space, helps a lot, provided that the functions 
&(@) and 4(@) are considered known — either calculated or taken from experiment. Let us have a good 
look at this approach because it gives some interesting (and practically important) results. 


In Chapter 6, we have used the macroscopic Maxwell equations to derive Eqs. (6.118), which 
describe the time evolution of electrodynamic potentials in a linear medium with frequency-independent 
éand yw. Looking for all functions participating in Eqs. (6.118) in the plane-wave expansion form*? 


fa.=[akfdof,™, (10.100) 
and requiring all coefficients at similar exponents to be balanced, we get their Fourier images: “4 
p oO ° 
(i? aan) ,.4= 82, (k? eH) AL, = Hin: (10.101) 


As was discussed in Chapter 7, in such a Fourier form, the macroscopic Maxwell theory remains valid 
even for dispersive (but isotropic and linear!) media, so that Eqs. (101) may be generalized as 


lk? - we(w)u(o) |, ,, = a lk? -o*e(o)u(@)A,, = (@) igo, (10.102) 


An evident advantage of these equations is that their formal solution is elementary: 


Di. _ Prx.o _ (a) Ino 


(10.103) 


, AGS =p . 
e(O)? — a7 e(o) (| ° [eo (ou) 


so that the “only” remaining things to do is, first, to calculate the Fourier transforms of the functions 
Ar, 2) and j(r, #), describing stand-alone charges and currents, using the transform reciprocal to Eq. 
(100), with one factor 1/27 per each scalar dimension, 


1 
(27)" 
and then to carry out the integration (100) of Eqs. (103). 


feo =a rfp. ne, (10.104) 


For our problem of a single charge g uniformly moving through a medium with velocity u, 


p(r,t) = qgo(r —ut), jar,t) = quod(r —ut), (10.105) 


43 All integrals here and below are in infinite limits unless specified otherwise. 

44 As was discussed in Sec. 7.2, the Ohmic conductivity of the medium (generally, also a function of frequency) 
may be readily incorporated into the dielectric permittivity: (@) > &(@) + io(@)/a. In this section, I will assume 
that such incorporation, which is especially natural for high frequencies, has been performed, so that the current 
density j(r, t) describes only stand-alone currents — for example, the current (105) of the incident particle. 
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the first task is easy: 


q 3 -i(kr-ot) sq i(ot-k-ut) , sg 
= dt qgo(r —ut = —_ dt = ——~ 6(@-k-u). (10.106 
Pro =o [at qo - ure e qj OK). (10.106) 


(27)" 


Since the expressions (105) for p(r, ft) and j(r, 4) differ only by a constant factor u, it is clear that the 
absolutely similar calculation for the current gives 


; qu 
ino = Gap 5(@-k-u). (10.107) 


Let us summarize what we have got by now, by plugging Eqs. (106)-(107) into Eqs. (103): 


_ qo(@-k-u) 1 wu(@)qud(@—-k-u) 
V0 = on Ole aaa! “Gm? Te —oe(mala) = ¢(a) u(o)ug,, ,, (10.108) 


Now, at the last calculation step, namely the integration (100), we are starting to pay a heavy 
price for the easiness of the first steps. This is why let us think well about what exactly we need from it. 
First of all, for the calculation of power losses, the electric field is more convenient to use than the 
potentials, so let us calculate the Fourier images of E and B. Plugging the expansion (100) into the basic 
relations (6.7), and again requiring the balance of exponent’s coefficients, we get 


E,, = -ik¢,,, +i@A,,,, = iloe(@)u(@)u-k]¢,,,,. By, =iKx A,,, =ié(@)u(@)k xug,,, (10.109) 


so that Eqs. (100) and (108) yield 
[ve(@) u(o)u - k |6(o- k-u) 


= 3 i(k-r-at) _ ei(kr-at) 
E(r,t) =[d k[do Ey .¢ ee . 4 fq’ k(do en ETE (10.110) 


This formula may be rewritten as the temporal Fourier integral (51), with the following r-dependent 
complex amplitude: 


ay do (10.111) 


=fE, et a’k __ iq Joe(o) (coy a SCENE) 
(27) e(a)lk -@ e(a) (0) 


Let us calculate the Cartesian components of this partial Fourier image E,, at a point separated 
by distance b from the particle’s trajectory. Selecting the coordinates and time origin as shown in Fig. 3, 
we have r = {0, 5, 0} and u = {u, 0, 0}, so that only £, and £, are different from zero. In particular, 
according to Eq. (111), 
DEL ONES Bo 52 wexpyk,b}. (10.112) 
— 0° €(@) U(@) 


(E.),, = aw tt [ ak, Jak. 


The delta function kills one integral (over k,) of the three, and we get 

dk 
(E.), = — [oo(oyutom— 2 J exptik, bjdk, f 2 : . (10.113) 
u 


(27r)’ e(@)u a lu +k +k? —o' (a) ua) 
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The internal integral (over k,) may be readily reduced to the table integral Idd + &) in infinite limits, 
equal to z,*5 and result represented as 


17 kK expyik b} 
(E,)y = Af dk, (10.114) 
(21) we(o) (k? + x?) 
where the parameter « (generally, a complex function of frequency) is defined as*6 
e(o)=0'{ 5 -oeuo)], (10.115) 
The last integral may be expressed via the modified Bessel function of the second kind:47 
. 2 
iquk 
E.), =-———— K, (#®). 10.116 
(Ey) (On)? wale o (xb) ( ) 
A very similar calculation yields 
qk 
EE), =———— K (#9). 10.117 
(E,) (Qn)? ela) | (40) ( ) 


Now, instead of rushing to make the final integration (51) over @ to calculate E(A), let us realize 
that what we need most is the total energy loss through the whole time of the particle’s passage over an 
elementary distance dx. According to Eq. (4.38), the energy loss per unit volume is 


a= JiEa. (10.118) 


where j is the current of the bound charges in the medium, and should not be confused with the stand- 
alone incident-particle current (105). This integral may be readily expressed via the partial Fourier 
image E,, and the similarly defined image jq., just as it was done at the derivation of Eq. (54): 


-E,, =22|do|do'j, -E.,5(@+ o) = 27] j, -E_,do. (10.119) 


oO 


-< =| dt| dae (dare 1" 


Let us incorporate the effective Ohmic conductivity o(@) into the complex permittivity ¢(@) just as 
this was discussed in Sec. 7.2, using Eq. (7.46) to write 


j, =o.(@)E, =-iae(o)E, . (10.120) 
As aresult, Eq. (119) yields 
-< =~2ri| e(o)E, -E_,ado=42 Im] e(0)|£,| ade. (10.121) 


(The last step was possible due to the property €&—@) = € (@), which was discussed in Sec. 7.2.) 


45 See, e.g., MA Eq. (6.5a). 
46 The frequency-dependent parameter «(@) should not be confused with the dc low-frequency dielectric constant 
x =&(0)/€ that was discussed in Chapter 3. 


47 As a reminder, the main properties of these functions are listed in Sec. 2.7 — see, in particular, Fig. 2.22 and 
Eqs. (2.157)-(2.158). 
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Finally, just as in the last section, we have to average the energy loss rate over random values of 
the impact parameter b: 


dé _(( dé), . %( dé), (oa | 
2 ar) o=2e || gy Jed [ob] E, 


Due to the (weak) divergence of the functions Ko(é) and K,(é) at € > 0, we have to cut the resulting 
integral over b at some bmin Where our theory loses legitimacy. (On that limit, we are not doing much 
better than in the past section). Plugging in the calculated expressions (116) and (117) for the field 
components, swapping the integrals over w and b, and using the recurrence relations (2.142), which are 
valid for all Bessel functions, we finally get: 


+{6,[ )ime(olado. (10.122) 


dao 


\K(K Prin) nae 


win KCK” Bin (10.123) 


5 


_ er Im | (xb 
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This general result is valid for a linear medium with arbitrary dispersion relations &(@) and 4(@). 
(The last function participates in Eq. (123) only via Eq. (115) that defines the parameter «.) To get more 
concrete results, some particular model of the medium should be used. Let us explore the Lorentz- 
oscillator model that was discussed in Sec. 7.2, in its form (7.33) suitable for the transition to the 
quantum-mechanical description of atoms: 


ng" qi 
2 2 . 
7 (@; —@°)-2iwd, 


&(@) =& + , with Vf =b  wlo)=%. (10.124) 


If the damping of the effective atomic oscillators is low, 6; << @, as it typically is, and the particle’s 
speed uw is much lower than the typical wave’s phase velocity v (and hence than c!), then for most 
frequencies Eq. (115) gives 


1 1 a 
K’(@)=@° er 10.125 
( ) ur v(@) u? \ ) 
Le. K & K* = o/u is virtually real. In this case, Eq. (123) may be reduced to Eq. (95) with 
i (10.126) 
() 


The good news here is that both approaches (the microscopic analysis of Sec. 4 and the 
macroscopic analysis of this section) give essentially the same result. The same fact may be also 
perceived as bad news: the treatment of the medium as a continuum does not give any new results here. 
The situation somewhat changes at relativistic velocities, at which such treatment provides noticeable 
corrections (called density effects), in particular reducing the energy loss estimates. 


Let me, however, leave these details for special topic courses and focus on a much more 
important effect described by our formulas. Consider the dependence of the electric field components on 
the impact parameter b, i.e. on the closest distance between the particle’s trajectory and the field 
observation point. At b > «, we can use, in Eqs. (116)-(117), the asymptotic formula (2.158), 


a 


1/2 
=) e?, ate 0, (10.127) 


K@ >| 
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to conclude that if x* > 0, i.e. if « is real, the complex amplitudes E,, of both components E, and EOF 
the electric field decrease with b exponentially. However, let us consider what happens at frequencies 
where x(a) < 0,48 ie. 


(2 
Fwy < ab cot = ooh: (10.128) 


(This condition means that the particle’s velocity is larger than the phase velocity of the waves at this 
particular frequency.) In this case, the parameter «(@) is purely imaginary, so that the functions exp {«b} 
in the asymptotes (127) of Eqs. (116)-(117) become just phase factors, and the field component 
amplitudes fall very slowly: 


E(@) (@) = 


1 
pi? ‘ 
This means that the Poynting vector drops as 1/b, so that its flux through a surface of a round cylinder of 


radius b, with its axis on the particle trajectory (i.e. the power flow from the particle), does not depend 
on b at all. This is an electromagnetic wave emission — the famous Cherenkov radiation.*? 


E, (@)| & |B, (@)| « (10.129) 


The direction n of its propagation may be readily found taking into account that at large 
distances from the particle’s trajectory, the emitted wave has to be locally planar and transverse (nLE), 
so that the so-called Cherenkov angle @ between the vector n and the particle’s velocity u may be 
simply found from the ratio of the electric field components — see Fig. 14a: 


a eee : (10.130) 
E 


Fig. 10.14. (a) The Cherenkov radiation’s propagation angle @ and (b) its interpretation. 


The ratio on the right-hand side of this relation may be calculated by plugging the asymptotic 
formula (127) into Eqs. (116) and (117) and calculating their ratio: 


48 Strictly speaking, the inequality «°(@) < 0 does not make sense for a medium with a complex product {@)1(0), 
and hence complex «°(@). However, in a typical medium where particles can propagate over substantial distances, 
the imaginary part of the product «&(@)4(@) does not vanish only in very limited frequency intervals, much more 
narrow than the intervals that we are discussing now — please have one more look at Fig. 7.5. 

49 This radiation was observed experimentally by Pavel Alekseevich Cherenkov (in older Western texts, 
“Cerenkov”) in 1934, with the observations explained by Ilya Mikhailovich Frank and Igor Yevgenyevich Tamm 
in 1937. Note, however, that the effect had been predicted theoretically as early as 1889 by the same Oliver 
Heaviside whose name was mentioned in this course so many times — and whose genius I believe is still 
underappreciated. 
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E . 5 1/2 
tan@ =-—* =" = [e(@) (ou? -1]"" =| “—-1] , (10.131a) 
E, @ v’ (0) 
so that 
ee (10.131b) 
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Remarkably, this direction does not depend on the emission time fe, so that the radiation of 
frequency @, at each instant, forms a hollow cone led by the particle. This simple result allows an 
evident interpretation (Fig. 14b): the cone’s interior is just the set of all observation points that have 
already been reached by the radiation, propagating with the speed v(@) < u, emitted from all previous 
points of the particle’s trajectory by the given time ¢. This phenomenon is an analog of the so-called 
Mach cone in fluid dynamics,*° besides that in the Cherenkov radiation, there is a separate cone for each 
frequency (of the range in which v(@) < u): the smaller is the &@)1(.@) product, i.e. the higher is the 
wave velocity v(@) = [dou o)]'”, the broader is the cone, so that the earlier the corresponding 
“shock wave” arrives to an observer. Please note that the Cherenkov radiation is a unique radiative 
phenomenon: it takes place even if a particle moves without acceleration, and (in agreement with our 
analysis in Sec. 2), is impossible in free space, where v(@) = c = const 1s larger than u for any particle. 


The Cherenkov radiation’s intensity may be also readily found by plugging the asymptotic 
expression (127), with imaginary «x, into Eq. (123). The result is 


0 >, \2 2 
-4 (2) f of - o. (10.132) 
v(@)<u 


For non-relativistic particles (u << c), the Cherenkov radiation condition u > v(@) is fulfilled only in 
relatively narrow frequency intervals where the product «@)/4{@) is very large (usually, due to optical 
resonance peaks of the electric permittivity — see Fig. 7.5 and its discussion). In this case, the emitted 
light consists of a few nearly-monochromatic components. On the contrary, if the condition u > v(), i.e. 
u’/&@)() > 1 is fulfilled in a broad frequency range, as it is for ultra-relativistic particles in 
condensed media, then the radiated power, according to Eq. (132), is dominated by higher frequencies of 
the range — hence the famous bluish color of the Cherenkov radiation glow from water-filled nuclear 
reactors— see Fig. 15. 


Fig. 10.15. The Cherenkov 
radiation glow’ in __ the 
Advanced Test Reactor of the 
Idaho National Laboratory in 
Arco, ID. (Adapted from 
http://en.wikipedia.org/wiki/ 
Cherenkov radiation under 
the Creative Commons CC- 
BY-SA-2.0 license.) 


50 Its brief discussion may be found in CM Sec. 8.6. 
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The Cherenkov radiation is broadly used in high-energy experiments for particle identification 
and speed measurement (since it is easy to pass the particles through layers of different densities and 
hence with different dielectric constants) — for example, in the so-called Ring Imaging Cherenkov 
(RICH) detectors that have been designed for the DELPHI experiment®! at the Large Electron-Positron 
Collider (LEP) in CERN. 


A little bit counter-intuitively, the formalism described in this section is also very useful for the 
description of an apparently rather different effect — the so-called transition radiation that takes place 
when a charged particle crosses a border between two media.>2 The effect may be interpreted as the 
result of the time dependence of the electric dipole formed by the moving charge q and its mirror image 
q’ in the counterpart medium — see Fig. 16. 


Fig. 10.16. The transition radiation’s 
physics. 


In the non-relativistic limit, this effect allows a straightforward description combining the 
electrostatics picture of Sec. 3.4 (see Fig. 3.9 and its discussion), and Eq. (8.27), corrected for the media 
polarization effects. However, if the particle’s velocity u is comparable with the phase velocity of waves 
in either medium, the adequate theory of the transition radiation becomes very close to that of the 
Cherenkov radiation. 


In comparison with the Cherenkov radiation, the transition radiation is rather weak, and its 
practical use (mostly for the measurement of the Lorentz factor y, to which the radiation intensity is 
nearly proportional) requires multi-layered stacks.*? In these systems, the radiation emitted at sequential 
borders may be coherent, and the system’s physics may become close to that of the free-electron lasers 
mentioned in Sec. 4. 


10.6. Radiation’s back-action 


An attentive and critically-minded reader could notice that so far our treatment of charged 
particle dynamics has never been fully self-consistent. Indeed, in Sec. 9.6 we have analyzed particle’s 
motion in various external fields, ignoring those radiated by the particle itself, while in Sec. 8.2 and 
earlier in this chapter these fields have been calculated (admittedly, just for a few simple cases), but, 
again, their back-action on the emitting particle has been ignored. Only in very few cases we have taken 


5! See, e.g., http://delphiwww.cern.ch/offline/physics/delphi-detector.html. For an in-depth review of radiation 
detectors (including the Cherenkov ones), the reader may be referred, for example, to the classical text by G. F. 
Knoll, Radiation Detection and Measurement, 4" ed., Wiley, 2010, and a newer treatment by K. Kleinknecht, 
Detectors for Particle Radiation, Cambridge U. Press, 1999. 

52 The effect was predicted theoretically in 1946 by V. Ginzburg and I. Frank, and only later observed 
experimentally. 

53 See, e.g., Sec. 5.3 in K. Kleinknecht’s monograph cited above. 
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the back effects of the radiation implicitly, via the energy conservation arguments. However, even in 
these cases, the near-field effects, such as the first term in Eq. (19), which affect the moving particle 
most, have been ignored. 


At the same time, it is clear that in sharp contrast with electrostatics, the interaction of a moving 
point charge with its own field cannot be always ignored. As the simplest example, if an electron is 
made to fly through a resonant cavity, thus inducing electromagnetic oscillations in it, and then is forced 
(say, by an appropriate static field) to return into the cavity before the oscillations have decayed, its 
motion will certainly be affected by the oscillating fields, just as if they had been induced by another 
source. There is no conceptual problem with applying the Maxwell theory to such “field-particle 
rendezvous” effects; moreover, it is the basis of the engineering design of such vacuum electron devices 
as klystrons, magnetrons, and free-electron lasers. 


A problem arises only when no clear “rendezvous” points are enforced by boundary conditions, 
so that the most important self-field effects are at R = |r —r’| > 0, the most evident example being the 
charged particle’s radiation into free space, described earlier in this chapter. We already know that such 
radiation takes away a part of the charge’s kinetic energy, i.e. has to cause its deceleration. One should 
wonder, however, whether such self-action effects might be described in a more direct, non-perturbative 
way. 

As the first attempt, let us try a phenomenological approach based on the already derived 
formulas for the radiation power / For the sake of simplicity, let us consider a non-relativistic point 
charge g in free space, so that Y is described by Eq. (8.27), with the electric dipole moment’s derivative 
over time equal to gu: 

2 2 
24 Pe 2 q_.» 


f= = u 10.133 
6c" 3c° Az, ( ) 
The most naive approach would be to write the equation of the particle’s motion in the form 
mu=F., + Foe. (10.134) 


and try to calculate the radiation back-action force Fy.¢ by requiring its instant power, —Fyeiru, to be 
equal to “ However, this approach (say, for a 1D motion) would give a very unnatural result, 


-2 
Picton, (10.135) 
uUu 


that might diverge at some points of the particle’s trajectory. This failure is clearly due to the retardation 
effect: as the reader may recall, Eq. (133) results from the analysis of radiation fields in the far-field 
zone, i.e. at large distances R from the particle, e.g., from the second term in Eq. (19), i.e. when the non- 
radiative first term (which is much larger at small distances, R — 0) is ignored. 


Before exploring the effects of this term, let us, however, make one more attempt at Eq. (133), 
considering its average effect on some periodic motion of the particle. (A possible argument for this 
step is that at the periodic motion, the retardation effects should be averaged out — just as at the transfer 
from Eq. (8.27) to Eq. (8.28).) To calculate the average, let us write the identity 

es v 
A hae os 
ti =7 [0 udt, (10.136) 


0 
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and carry out the integration on the right-hand side of Eq. (133) by parts over the motion period 7 


a fe De ge 
- fii-udt|=-—[= 4 ij-udt. (10.137) 
O 7 5 3c” 42, 


(10.138) 


These two averages coincide if*4 


(10.139) 


This is the so-called Abraham-Lorentz force of back-action. Before going after a more serious 
derivation of this formula, let us estimate its scale, representing Eq. (139) as 


2 


i 
Fy, =m, with r= —-7_, (10.140) 
3mc” 47&, 


where the constant 7 evidently has the dimension of time. Recalling the definition (8.41) of the classical 
radius r, of the particle, Eq. (140) for t may be rewritten as 
a 
“3 
For the electron, 7 is of the order of 10° s, so that the right-hand side of Eq. (140) is very small. This 


means that in most cases the Abrahams-Lorentz force is either negligible or leads to the same results as 
the perturbative treatments of energy loss we have used earlier in this chapter. 


fa (10.141) 


However, Eq. (140) brings some unpleasant surprises. For example, let us consider a 1D 
oscillator with frequency @p. For it, Eq. (134), with the back-action force given by Eq. (140), takes the 
form 

mk +ma,x=mtX. (10.142) 


Looking for the solution of this linear differential equation in the usual exponential form, x(f) « 
exp {At}, we get the following characteristic equation, 


+0, =H. (10.143) 


It may look like that for any “reasonable” value of a << 1/r ~ 10” s", the right-hand side of this 
nonlinear algebraic equation may be treated as a perturbation. Indeed, looking for its solutions in the 


54 Just for the reader’s reference, this formula may be readily generalized to the relativistic case, in the 4-form: 


a _ 2 q |d’p* _p® (45 dp® 
a yt Df. ope. Ve 
3mc 4me,| dt (mc) \ dt dt 


—the so-called Abraham-Lorentz-Dirac force. 
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natural form Ay. = ti@ + A’, with | 2’| << @p, expanding both parts of Eq. (143) in the Taylor series in 
the small parameter 2’, and keeping only the terms linear in 2’, we get 
_ ®t 

a 


AM (10.144) 
This means that the energy of free oscillations decreases in time as exp{2A’t} = exp{-a tft; this is 
exactly the radiative damping analyzed earlier. However, Eq. (143) is deceiving; it has the third root 
corresponding to unphysical, exponentially growing (so-called run-away) solutions. It is easiest to see 
this for a free particle, with @ = 0. Then Eq. (143) becomes very simple, 


2 = 223, (10.145) 


and it is easy to find all its three roots explicitly: 4; = 2. = 0 and A; = 1/r. While the first two roots 
correspond to the values 7 found earlier, the last one describes an exponential (and extremely rapid!) 
acceleration. 


In order to remove this artifact, let us try to develop a self-consistent approach to the back-action 
effects, taking into account the near-field terms of particle fields. For that, we need to somehow 
overcome the divergence of Eqs. (10) and (19) at R > 0. The most reasonable way to do this is to spread 
the particle’s charge over a ball of radius a, with a spherically symmetric (but not necessarily constant) 
density (vr), and at the end of the calculations trace the limit a > 0.°5 Again sticking to the non- 
relativistic case (so that the magnetic component of the Lorentz force is not important), we should 
calculate 


Fa = | O@)EC,Od*r, (10.146) 
V 


where the electric field is that of the charge itself, with the field of any elementary charge dq = p(r)d’r 
described by Eq. (19). 


To enable an analytical calculation of the force, we need to make the assumption a << r,, treat 
the ratio R/r, ~ a/re as a small parameter, and expand the resulting right-hand side of Eq. (146) into the 
Taylor series in small R. This procedure yields 


2 1 G(-1)" du 
= > 


self ~ 
= 3426, C0" nl dt 


[arfarp~R™ pcr’). (10.147) 


The distance R cancels only in the term with n = 1, 


2 wu 


3c° Are, 


1 


a ae 
d*r| d?r'p(r) p(r') = —-—— ii 10.148 
J J POR) == ro ( ) 


showing that we have recovered (now in an apparently legitimate fashion) Eq. (139) for the Abrahams- 
Lorentz force. One could argue that in the limit a > 0 the terms higher in R ~ a (with n > 1) could be 


55 Note: this operation cannot be interpreted as describing a quantum spread due to the finite extent of the point 
particle’s wavefunction. In quantum mechanics, different parts of the wavefunction of the same charged particle 
do not interact with each other! 
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ignored. However, we have to notice that the main contribution to the series (147) is not described by 
Eq. (148) for n = 1, but is given by the much larger term with n = 0: 


o= 


2 Lay re en. 4a ser fa’r PUIOW Dass iy Gigado) 


7 ATE, C 3° 4ze, R 3c7 


V V 


where U is the electrostatic energy (1.59) of the static charge’s self—interaction. This term may be 
interpreted as the inertial “force” °° (-m-.,a) with the following effective electromagnetic mass: 


_4U 
m ; 10.150 
ef =5 oe ( ) 
which is a factor of 4/3 larger than it should be according to Einstein’s formula (9.73). This is the 
famous (or rather infamous :-) 4/3 problem that does not allow one to interpret the electron’s mass as 
that of its electric field. Some (admittedly, rather formal) resolution of this paradox is possible only in 
quantum electrodynamics with its renormalization techniques — beyond the framework of this course. 


Note, however, that all these issues are only important for motions with frequencies of the order 
of 1/r~ 10” s", ie. at energies & ~ h/t ~ 10° eV, while other quantum electrodynamics effects may be 
observed at much lower frequencies, starting from ~10'° s'. Hence the 4/3 problem is by no means the 
only motivation for the transfer from classical to quantum electrodynamics. However, the reader should 
not think that their time spent on this course has been lost: quantum electrodynamics it heavily based on 
classical electrodynamics, incorporates virtually all its results, and the basic transition between them is 
surprisingly straightforward.>’ So, I look forward to welcoming the reader to the next, quantum- 
mechanics part of this series. 


10.6. Exercise problems 


10.1. Derive Eqs. (10) from Eqs. (1) by a direct (but careful!) integration. 


10.2. Derive the radiation-related parts of Eqs. (19)-(20) from the Liénard-Wiechert potentials 
(10) by direct differentiation. 


10.3. A point charge g that had been in a stationary position on a circle of 
radius R is carried over, along the circle, to the opposite position on the same 
diameter (see the figure on the right) as fast as only physically possible, and then is 
kept steady at this new position. Calculate and sketch the time dependence of its 
electric field E at the center of the circle. 


10.4. Express the instantaneous power of electromagnetic radiation by a relativistic particle with 
electric charge g and rest mass m, moving with velocity u, via the external Lorentz force F exerted on it. 


56 See, e.g., CM Sec. 4.6. 
57 See, e.g., QM Secs. 9.1-9.4. 
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10.5. A relativistic particle with rest mass m and electric charge q, initially at rest, is accelerated 
by a constant force F until it reaches a certain velocity u, and then moves by inertia. Calculate the total 
energy radiated during the acceleration. 


10.6. A charged relativistic particle with initial momentum po flies ballistically from a free-space 
region into a region of a constant, uniform electric field E, whose force is directed opposite to po. 
Calculate the energy radiated by the particle during its motion in the field, assuming that it is small in 
comparison with the particle’s initial kinetic energy. 


10.7. Calculate 

(i) the instantaneous power, and 

(ii) the power spectrum 
of the radiation emitted, into a unit solid angle, by a relativistic particle with charge g, performing 1D 
harmonic oscillations with frequency @ and displacement amplitude a. 


10.8. Analyze the polarization and the spectral contents of the synchrotron radiation propagating 
in the direction normal to the particle’s rotation plane. How do the results change if not one, but N > 1 
similar particles move around the circle, at equal angular distances? 


10.9. Calculate and analyze the time dependence of the energy of a charged relativistic particle 
performing synchrotron motion in a constant and uniform magnetic field B, and hence emitting the 
synchrotron radiation. Qualitatively, what is the particle’s trajectory? 


Hint: You may assume that the energy loss is relatively slow (-dé/dt << @€), but should spell 
out the condition of validity of this assumption. 


10.10. Analyze the polarization of the synchrotron radiation propagating within the particle’s 
rotation plane. 


10.11." The basic quantum theory of radiation shows that the electric dipole radiation by a 
particle is allowed only if the change of its angular momentum’s magnitude L at the transition is of the 


order of Planck’s constant h. 


(i) Estimate the change of LZ of an ultra-relativistic particle due to its emission of a typical single 
photon of the synchrotron radiation. 
(11) Do you think quantum mechanics forbid such radiation? If not, why? 


10.12. A relativistic particle moves along the z-axis, with velocity u., through an undulator — a 
system of permanent magnets providing (in the simplest model) a perpendicular magnetic field, whose 
distribution near the axis is sinusoidal:°8 


B=n,B, cosk,z. 


58 As the Maxwell equation for VxH shows, this field distribution cannot be created in any non-zero volume of 
free space. However, it may be created on a line — e.g., on the particle’s trajectory. 


Chapter 10 Page 38 of 40 


Essential Graduate Physics EM: Classical Electrodynamics 


Assuming that the field is so weak that it causes negligible deviations of the particle’s trajectory from 
the straight line, calculate the angular distribution of the resulting radiation. What condition does the 
above assumption impose on the system’s parameters? 


10.13. Discuss possible effects of the interference of the undulator radiation from different 
periods of its static field distribution. In particular, calculate the angular positions of the power density 
maxima. 


10.14. An electron launched directly toward a plane surface of a perfect conductor is instantly 
absorbed by it at the collision. Calculate the angular distribution and the frequency spectrum of the 
electromagnetic waves radiated at this collision if the initial kinetic energy 7 of the particle is much 
larger than the conductor’s workfunction y.>? Is your result valid near the conductor’s surface? 


10.15. A relativistic particle, with rest mass m and electric charge q, flies ballistically, with 
velocity u, by an immobile point charge q’, with an impact parameter b so large that the deviations of its 
trajectory from the straight line are negligible. Calculate the total energy loss due to the electromagnetic 
radiation during the passage. Formulate the conditions of validity of your result. 


59 See Sec. 2.9, in particular Fig. 2.27a. 
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Chapter 1. Introduction 


This introductory chapter briefly reviews the major experimental motivations for quantum mechanics, 
and then discusses its simplest formalism — Schrddinger’s wave mechanics. Much of this material 
(besides the last section) may be found in undergraduate textbooks,! so that the discussion is rather 
brief, and focused on the most important conceptual issues. 


1.1. Experimental motivations 


By the beginning of the 1900s, physics (which by that time included what we now call non- 
relativistic classical mechanics, classical thermodynamics and statistics, and classical electrodynamics 
including the geometric and wave optics) looked like an almost completed discipline, with most human- 
scale phenomena reasonably explained, and just a couple of mysterious “dark clouds’? on the horizon. 
However, rapid technological progress and the resulting development of more refined scientific 
instruments have led to a fast multiplication of observed phenomena that could not be explained on a 
classical basis. Let me list the most consequential of those experimental findings. 


(1) The blackbody radiation measurements, pioneered by G. Kirchhoff in 1859, have shown that 
in the thermal equilibrium, the power of electromagnetic radiation by a fully absorbing (“black’’) 
surface, per unit frequency interval, drops exponentially at high frequencies. This is not what could be 
expected from the combination of classical electrodynamics and statistics, which predicted an infinite 
growth of the radiation density with frequency. Indeed, classical electrodynamics shows? that 
electromagnetic field modes evolve in time just as harmonic oscillators, and that the number dN of these 
modes in a large free-space volume V >> 2°, within a small frequency interval dw << @ near some 
frequency @, is 


3 2 2 
oy oy ap, (1.1) 
(27) (27) TC 

where c ~ 3x10° m/s is the free-space speed of light, k = a/c the free-space wave number, and 4 = 2a/k 
is the radiation wavelength. On the other hand, classical statisticst predicts that in the thermal 
equilibrium at temperature 7, the average energy E of each 1D harmonic oscillator should be equal to 
kpT, where kg is the Boltzmann constant.> 


Combining these two results, we readily get the so-called Rayleigh-Jeans formula for the 
average electromagnetic wave energy per unit volume: 


! See, for example, D. Griffith, Quantum Mechanics, 2 ed. Cambridge U. Press, 2016. 

? This famous expression was used in a 1900 talk by Lord Kelvin (born William Thomson), in reference to the 
results of blackbody radiation measurements and the Michelson-Morley experiments, i.e. the precursors of 
quantum mechanics and relativity theory. 

3 See, e.g., EM Sec. 7.8, in particular Eq. (7.211). 

4 See, e.g., SM Sec. 2.2. 

5 In the SI units, used throughout this series, kg ¥ 1.38107? J/K — see Appendix CA: Selected Physical Constants 
for the exact value. 
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that diverges at @ — (Fig. 1) — the so-called ultraviolet catastrophe. On the other hand, the blackbody 
radiation measurements, improved by O. Lummer and E. Pringsheim, and also by H. Rubens and F. 


Kurlbaum to reach a 1%-scale accuracy, were compatible with the phenomenological law suggested in 
1900 by Max Planck: 


kT, (1.2) 


. ho 
u 


(1.3a) 


~ we exp{ia/k,T}-1 


This law may be reconciled with the fundamental Eq. (1) if the following replacement is made for the 
average energy of each field oscillator: 
Sn (1.3b) 
exp(ia/k,T)—-1 


h~1.055x10™ J-s, (1.4) 


now called the Planck’s constant.® At low frequencies (A@ << kg7), the denominator in Eq. (3) may be 
approximated as halkgT, so that the average energy (3b) tends to its classical value kg7, and the 
Planck’s law (3a) reduces to the Rayleigh-Jeans formula (2). However, at higher frequencies (i@ >> 
kgT), Eq. (3) describes the experimentally observed rapid decrease of the radiation density — see Fig. 1. 


with a factor 


10, 


u 
Uy 
0.1 
Fig. 1.1. The blackbody radiation density u, in units 
of uy = (kpTy/#i?'c’, as a function of frequency, 
0.01 according to the Rayleigh-Jeans formula (blue line) 
= ' = and the Planck’s law (red line). 
holkpT 


(11) The photoelectric effect, discovered in 1887 by H. Hertz, shows a sharp lower boundary for 
the frequency of the incident light that may kick electrons out from metallic surfaces, independent of the 
light’s intensity. Albert Einstein, in one of his three famous 1905 papers, noticed that this threshold @nin 
could be explained assuming that light consisted of certain particles (now called photons) with energy 


6 Max Planck himself wrote ia as hv, where v= @/27 is the “cyclic” frequency (the number of periods per 
second) so that in early texts on quantum mechanics the term “Planck’s constant” referred to h = 2 2h, while hi was 
called “the Dirac constant” for a while. I will use the contemporary terminology, and abstain from using the “old 
Planck’s constant” /: at all, to avoid confusion. 
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E=ho, (1.5) 


with the same Planck’s constant that participates in Eq. (3).’ Indeed, with this assumption, at the photon 
absorption by an electron, its energy E = h@ is divided between a fixed energy Up (nowadays called the 
workfunction) of electron’s binding inside the metal, and the excess kinetic energy m.v’/2 > 0 of the 
freed electron — see Fig. 2. In this picture, the frequency threshold finds a natural explanation as @nin= 
Ud/h.8 Moreover, as was shown by Satyendra Nath Bose in 1924, ° Eq. (5) explains Planck’s law (3). 


—ée Fig. 1.2. Einstein’s explanation of the 
m photoelectric effect’s frequency threshold. 


(iii) The discrete frequency spectra of the electromagnetic radiation by excited atomic gases 
could not be explained by classical physics. (Applied to the planetary model of atoms, proposed by 
Ernst Rutherford, classical electrodynamics predicts the collapse of electrons on nuclei in ~107'"s, due to 
the electric-dipole radiation of electromagnetic waves.!) Especially challenging was the observation by 
Johann Jacob Balmer (in 1885) that the radiation frequencies of simple atoms may be well described by 
simple formulas. For example, for the lightest, hydrogen atom, all radiation frequencies may be 
numbered with just two positive integers n and n’: 


1 1 
O,, 4 = 0[- = i (1.6) 


with @ = @,. ¥ 2.07x10'° s'. This observation, and the experimental value of @, have found its first 
explanation in the famous 1913 theory by Niels Henrik David Bohr, which was a phenomenological 
precursor of the present-day quantum mechanics. In this theory, @,,,’ was interpreted as the frequency of 
a photon that obeys Einstein’s formula (5), with its energy Ey,n’= h@,n’ being the difference between 
two quantized (discrete) energy levels of the atom (Fig. 3): 


E,» =E,-E, >0. (1.7) 


Bohr showed that Eq. (6) may be obtained from Eqs. (5) and (7), and non-relativistic!! classical 
mechanics, augmented with just one additional postulate, equivalent to the assumption that the angular 


7 As a reminder, A. Einstein received his only Nobel Prize (in 1921) for exactly this work, rather than for his 
relativity theory, i.e. essentially for jumpstarting the same quantum theory which he later questioned. 

8 For most metals, Up is between 4 and 5 electron-volts (eV), so that the threshold corresponds to Amax = 271¢/@min 
= 2ac/(Up/h) = 300 nm — approximately at the border between the visible light and the ultraviolet radiation. 

9 See, e.g., SM Sec. 2.5. 

10 See, e.g., EM Sec. 8.2. 

'l The non-relativistic approach to the problem may be justified a posteriori by the fact the resulting energy scale 
Eu, given by Eq. (13), is much smaller than the electron’s rest energy, m.c’ ~ 0.5 MeV. 
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momentum L = m.vr of an electron moving with velocity v on a circular orbit of radius 7 about the 
hydrogen’s nucleus (the proton, assumed to be at rest because of its much higher mass), is quantized as 


Angular 
a) ee 
quantization 


where /i is again the same Planck’s constant (4), and 7 is an integer. (In Bohr’s theory, 1 could not be 
equal to zero, though in the genuine quantum mechanics, it can.) 


Ee 
ho,» =£, -E, Fig. 1.3. The electromagnetic radiation 
Sono of a system at a result of the transition 
E between its quantized energy levels. 


Indeed, it is sufficient to solve Eq. (8), mevr = hin, together with the equation 


2 2 
Vv e 


m,—= (1.9) 


e ~ 2? 
r Aneor 


which expresses the 2"* Newton’s law for an electron rotating in the Coulomb field of the nucleus. (Here 
e = 1.6x10°'°C is the fundamental electric charge, and m, ¥ 0.91x10°° kg is the electron’s rest mass.) 
The result for 7 is 


7 = 0.0529 nm. C0) 


The constant rp, called the Bohr radius, is the most important spatial scale of phenomena in atomic, 
molecular, and condensed-matter physics — and hence in all chemistry and biochemistry. 


Now plugging these results into the non-relativistic expression for the full electron energy (with 
its rest energy taken for reference), 


2: 2 
ga * : (dT) 
2 Ame r 
we get the following simple expression for the electron’s energy levels: 
E Hydrogen 
E,=-—"<0, ey 
2n levels 
which, together with Eqs. (5) and (7), immediately gives Eq. (6) for the radiation frequencies. Here Ey is 
called the so-called Hartree energy constant (or just the “Hartree energy’’)!? 
Hartree 
(1.13a) energy 
constant 


(Note the useful relations, which follow from Eqs. (10) and (13a): 


12 Unfortunately, another name, the “Rydberg constant”, is sometimes used for either this energy unit or its half, 
Ey/2 = 13.6 eV. To add to the confusion, the same term “Rydberg constant” is used in some sub-fields of physics 
for the reciprocal free-space wavelength (1/2) = @/2 ac) corresponding to the frequency @ = Ey/2h. 
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pS = Le. ry = (1.13b) 
H 


e hi? e /A7é, 7 h?/m, a 
ATE)’, My Ey , 


the first of them shows, in particular, that rg is the distance at which the natural scales of the electron’s 
potential and kinetic energies are equal.) 


Note also that Eq. (8), in the form pr = fin, where p = mev is the electron momentum’s 
magnitude, may be rewritten as the condition than an integer number (n) of wavelengths 4 of certain 
(before the late 1920s, hypothetic) waves!3 fits the circular orbit’s perimeter: 277 = 2ahn/p = nd. 
Dividing both parts of the last equality by n, we see that for this statement to be true, the wave number k 
= 27// of the de Broglie waves should be proportional to the electron’s momentum p = mv: 


again with the same Planck’s constant as in Eq. (5). 


(iv) The Compton effect!4 is the reduction of frequency of X-rays at their scattering on free (or 
nearly-free) electrons — see Fig. 4. 


Fig. 1.4. The Compton effect. 


The effect may be explained assuming that the X-ray photon also has a momentum that obeys the 

vector-generalized version of Eq. (14): 
ho 
P photon =hk =—n, (1.15) 
é 

where k is the wavevector (whose magnitude is equal to the wave number k, and whose direction 
coincides with the unit vector n directed along the wave propagation!5), and that the momenta of both 
the photon and the electron are related to their energies E by the classical relativistic formula!® 


E? = (cp) +(me?)’. (1.16) 


(For a photon, the rest energy m is zero, and this relation is reduced to Eq. (5): E = cp = chk = ha.) 
Indeed, a straightforward solution of the following system of three equations, 


13 This fact was first noticed and discussed in 1924 by Louis Victor Pierre Raymond de Broglie (in his PhD 
thesis!), so that instead of speaking of wavefunctions, we are still frequently speaking of the de Broglie waves, 
especially when free particles are discussed. 

'4 This effect was observed in 1922, and explained a year later by Arthur Holly Compton, using Eqs. (5) and (15). 
15 See, e.g., EM Sec. 7.1. 

16 See, e.g., EM Sec. 9.3, in particular Eq. (9.78). 
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ho+m,c? =ho' +|(cp)? +(m,c?)?|!”, (1.17) 
h 4 
1@ 220 cos + pcosg, (1.18) 
Cc 6 
ha' . ‘ 
0= sind—- psing, (1.19) 
c 


(which express the conservation of, respectively, the full energy of the system and the two relevant 
Cartesian components of its full momentum, at the scattering event — see Fig. 4), yields the result: 
1 1 


= + = etl cos@), (1.20a) 
ho m,c 


e€ 


which is traditionally represented as the relation between the initial and final values of the photon’s 
wavelength 4 = 2a/k = 27/(a/c): "7 


Pane css A= e000), with 4, =— (1.20b) 
mc m 


€ 


and is in agreement with experiment. 


(v) De Broglie wave diffraction. In 1927, Clinton Joseph Davisson and Lester Germer, and 
independently George Paget Thomson succeeded to observe the diffraction of electrons on solid crystals 
(Fig. 5). Specifically, they have found that the intensity of the elastic reflection of electrons from a 
crystal increases sharply when the angle @ between the incident beam of electrons and the crystal’s 
atomic planes, separated by distance d, satisfies the following relation: 


where A = 2z/k = 2zhi/p is the de Broglie wavelength of the electrons, and n is an integer. As Fig. 5 
shows, this is just the well-known condition!® that the path difference A/ = 2dsina between the de 
Broglie waves reflected from two adjacent crystal planes coincides with an integer number of J, i.e. of 
the constructive interference of the waves. !? 


To summarize, all the listed experimental observations could be explained starting from two very 
simple (and similarly looking) formulas: Eq. (5) (at that stage, for photons only), and Eq. (15) for both 
photons and electrons — both relations involving the same Planck’s constant. This fact might give an 
impression of experimental evidence sufficient to declare the light consisting of discrete particles 


17 The constant Ac that participates in this relation, is close to 2.46x10"”” m, and is called the electron’s Compton 
wavelength. This term is somewhat misleading: as the reader can see from Eqs. (17)-(19), no wave in the 
Compton problem has such a wavelength — either before or after the scattering. 

18 See, e.g., EM Sec. 8.4, in particular Fig. 8.9 and Eq. (8.82). Frequently, Eq. (21) is called the Bragg condition, 
due to the pioneering experiments by W. Bragg on X-ray scattering from crystals, which were started in 1912. 

19 Later, spectacular experiments on diffraction and interference of heavier particles (with the correspondingly 
smaller de Broglie wavelength), e.g., neutrons and even Cgq molecules, have also been performed — see, e.g., the 
review by A. Zeilinger et al., Rev. Mod. Phys. 60, 1067 (1988) and a later publication by O. Nairz et al., Am. J. 
Phys. 71, 319 (2003). Nowadays, such interference of heavy particles is used, for example, for ultrasensitive 
measurements of gravity — see, e.g., a popular review by M. Arndt, Phys. Today 67, 30 (May 2014), and more 
recent experiments by S. Abend et al., Phys. Rev. Lett. 117, 203003 (2016). 
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(photons), and, on the contrary, electrons being some “matter waves” rather than particles. However, by 
that time (the mid-1920s), physics has accumulated overwhelming evidence of wave properties of light, 
such as interference and diffraction.?° In addition, there was also strong evidence for the lumped-particle 
(“corpuscular”) behavior of electrons. It is sufficient to mention the famous oil-drop experiments by 
Robert Andrew Millikan and Harvey Fletcher (1909-1913), in which only single (and whole!) electrons 
could be added to an oil drop, changing its total electric charge by multiples of electron’s charge (-e) — 
and never its fraction. It was apparently impossible to reconcile these observations with a purely wave 
picture, in which an electron and hence its charge need to be spread over the wave’s extension, so that 
its arbitrary part of it could be cut off using an appropriate experimental setup. 


Fig. 1.5. The De Broglie wave interference 
at electron scattering from a crystal lattice. 


Thus the founding fathers of quantum mechanics faced a formidable task of reconciling the wave 
and corpuscular properties of electrons and photons — and other particles. The decisive breakthrough in 
that task has been achieved in 1926 by Ervin Schrédinger and Max Born, who formulated what is now 
known either formally as the Schrédinger picture of non-relativistic quantum mechanics of the orbital 
motion?! in the coordinate representation (this term will be explained later in the course), or informally 
just as the wave mechanics. I will now formulate the main postulates of this theory. 


1.2. Wave mechanics postulates 


Let us consider a spinless,?2 non-relativistic point-like particle, whose classical dynamics may be 
described by a certain Hamiltonian function H(r, p, f),2? where r is the particle’s radius-vector and p is 
its momentum. (This condition is important because it excludes from our current discussion the systems 
whose interaction with their environment results in irreversible effects, in particular the friction leading 
to particle energy’s decay. Such “open” systems need a more general description, which will be 
discussed in Chapter 7.) Wave mechanics of such Hamiltonian particles may be based on the following 
set of postulates that are comfortingly elegant — though their final justification is given only by the 
agreement of all their corollaries with experiment.”4 


20 See, e.g., EM Sec. 8.4. 

21 The orbital motion is the historic (and rather misleading) term used for any motion of the particle as a whole. 

22 Actually, in wave mechanics, the spin of the described particle has not to be equal to zero. Rather, it is assumed 
that the particle spin’s effects on its orbital motion are negligible. 

23 As a reminder, for many systems (including those whose kinetic energy is a quadratic-homogeneous function of 
generalized velocities, like mv’/2), H coincides with the total energy E — see, e.g., CM Sec. 2.3. In what follows, I 
will assume that H = E. 

24 Quantum mechanics, like any theory, may be built on different sets of postulates/axioms leading to the same 
final conclusions. In this text, I will not try to beat down the number of postulates to the absolute possible 
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(1) Wavefunction and probability. Such variables as r or p cannot be always measured exactly, 
even at “perfect conditions” when all external uncertainties, including measurement instrument 
imperfection, varieties of the initial state preparation, and unintended particle interactions with its 
environment, have been removed.*> Moreover, r and p of the same particle can never be measured 
exactly simultaneously. Instead, the most detailed description of the particle’s state allowed by Nature, 
is given by a certain complex function ‘P(r, 2), called the wavefunction (or “wave function”), which 
generally enables only probabilistic predictions of the measured values of r, p, and other directly 
measurable variables — in quantum mechanics, usually called observables. 


Specifically, the probability dW of finding a particle inside an elementary volume dV = d’r is 
proportional to this volume, and hence may be characterized by a volume-independent probability 
density w = dW/a’r, which in turn is related to the wavefunction as 


w=|¥Or,)) =P (OPO), (1.22a) 


where the sign * denotes the usual complex conjugation. As a result, the total probability of finding the 
particle somewhere inside a volume V may be calculated as 


W =|wd?r=[W¥d*r. (1.22b) 
V V 


In particular, if volume V contains the particle definitely (1.e. with the 100% probability, W = 1), Eq. 
(22b) is reduced to the so-called wavefunction normalization condition 


[wova'r =1. (1.22c) 
V 


(11) Observables and operators. With each observable A, quantum mechanics associates a certain 


linear operator A , such that (in the perfect conditions mentioned above) the average measured value of 
A (usually called the expectation value) is expressed as*® 


(1.23) 


where (...) means the statistical average, i.e. the result of averaging the measurement results over a large 
ensemble (set) of macroscopically similar experiments, and V is the normalized wavefunction that obeys 
Eq. (22c). Note immediately that for Eqs. (22) and (23) to be compatible, the identity (or “unit’) 


operator defined by the relation 


has to be associated with a particular type of measurement, namely with the particle’s detection. 


minimum, not only because that would require longer argumentation, but chiefly because such attempts typically 
result in making certain implicit assumptions hidden from the reader — the practice as common as regrettable. 

25] will imply such perfect conditions in the further narrative, until the discussion of the system’s interaction with 
its environment in Chapter 7. 

26 This key measurement postulate is sometimes called the Born rule, though sometimes this term is used for the 
(less general) Eqs. (22). 
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(111) The Hamiltonian operator and the Schrédinger equation. Another particular operator, 


Hamiltonian H , whose observable is the particle’s energy E, also plays in wave mechanics a very 
special role, because it participates in the Schrodinger equation, 


eee HY, (1.25) 
Ot 


that determines the wavefunction’s dynamics, i.e. its time evolution. 


(iv) The _radius-vector_and momentum operators. In wave mechanics (in the “coordinate 
representation”), the vector operator of particle’s radius-vector r just multiples the wavefunction by this 


vector, while the operator of the particle’s momentum is proportional to the spatial derivative: 


(1.26a) 


(1.26b) 


(v) The correspondence principle. In the limit when quantum effects are insignificant, e.g., when 
the characteristic scale of action? (1.e. the product of the relevant energy and time scales of the problem) 
is much larger than Planck’s constant h, all wave mechanics results have to tend to those given by 
classical mechanics. Mathematically, this correspondence is achieved by duplicating the classical 
relations between various observables by similar relations between the corresponding operators. For 
example, for a free particle, the Hamiltonian (which in this particular case corresponds to the kinetic 
energy T= pl2m alone) has the form 


(1.27) 


Now, even before a deeper discussion of the postulates’ physics (offered in the next section), we 
may immediately see that they indeed provide a formal way toward resolution of the apparent 
contradiction between the wave and corpuscular properties of particles. Indeed, for a free particle, the 
Schrédinger equation (25), with the substitution of Eq. (27), takes the form 


ee ee ; (1.28) 


whose particular, but the most important solution is a plane, single-frequency (“monochromatic’’) 


traveling wave,?? 
(r,t) = ael(KT-e) | (1.29) 


27 If you need, see, e.g., Secs. 8-10 of the Selected Mathematical Formulas appendix — below, referred to as MA. 
Note that according to those formulas, the del operator follows all the rules of the usual (geometric) vectors. This 
is, by definition, true for other quantum-mechanical vector operators to be discussed below. 

28 See, e.g., CM Sec. 10.3. 

29 See, e.g., CM Sec. 6.4 and/or EM Sec. 7.1. 
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where a, k and @ are constants. Indeed, plugging Eq. (29) into Eq. (28), we immediately see that such 
plane wave, with an arbitrary complex amplitude a, is indeed a solution of this Schrédinger equation, 
provided a specific dispersion relation between the wave number k =|k | and the frequency @: 


2 
ip, (1.30) 
2m 
The constant a may be calculated, for example, assuming that the wave (29) is extended over a certain 
volume V, while beyond it, ‘¥ =0. Then from the normalization condition (22c) and Eq. (29), we get? 


Ja|’V =1. (1.31) 


Let us use Eqs. (23), (26), and (27) to calculate the expectation values of the particle’s 
momentum p and energy E = H in the state (29). The result is 


(p)=nk, — (z)=(H#)= 22, 


ae (1.32) 


according to Eq. (30), the last equality may be rewritten as (E) = ha. 


Next, Eq. (23) enables calculation of not only the average (in the math speak, the first moment) 
of an observable but also its higher moments, notably the second moment — in physics, usually called 


| (4?) =((4-(A)P) =(47)-(4)’, (1.33) 


and hence its uncertainty, alternatively called the “root-mean-square (r.m.s.) fluctuation”, 
~y\ 1/2 
54 =(A") (1.34) 


The uncertainty is a scale of deviations A=A- (A) of measurement results from their average. In the 


particular case when the uncertainty 6A equals zero, every measurement of the observable A will give 
the same value (A); such a state is said to have a definite value of the variable. For example, in 
application to the state with wavefunction (29), these relations yield 6E = 0, dp = 0. This means that in 
this plane-wave, monochromatic state, the energy and momentum of the particle have definite values, so 
that the statistical average signs in Eqs. (32) might be removed. Thus, these relations are reduced to the 
experimentally-inferred Eqs. (5) and (15). 


Hence the wave mechanics postulates indeed may describe the observed wave properties of non- 
relativistic particles. (For photons, we would need its relativistic generalization — see Chapter 9 below.) 
On the other hand, due to the linearity of the Schrédinger equation (25), any sum of its solutions is also 
a solution — the so-called linear superposition principle. For a free particle, this means that any set of 
plane waves (29) is also a solution to this equation. Such sets, with close values of k and hence p = Aik 
(and, according to Eq. (30), of @ as well), may be used to describe spatially localized “pulses”, called 
wave packets — see Fig. 6. In Sec. 2.1, I will prove (or rather reproduce H. Weyl’s proof :-) that the 
wave packet’s extension dx in any direction (say, x) is related to the width 6k, of the distribution of the 


30 For infinite space (V + 00), Eq. (31) yields a > 0, i.e. wavefunction (29) vanishes. This formal problem may be 
readily resolved considering sufficiently long wave packets — see Sec. 2.2 below. 
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corresponding component of its wave vector as dx dk, = 2, and hence, according to Eq. (15), to the width 
Op of the momentum component distribution as 


dip. 25, (1.35) 


(a) 


(b) 


e 
+e 
& 
a 
& 


aH ky k= p,/h 
: 3 . the particle is Fig. 1.6. (a) A snapshot of a typical wave packet 
: * (somewhere :-) propagating along axis x, and (b) the corresponding 
here! distribution of the wave numbers k,, i.e. the momenta p,. 


x 


This is the famous Heisenberg’s uncertainty principle, which quantifies the first postulate’s 
point that the coordinate and the momentum cannot be defined exactly simultaneously. However, since 
the Planck’s constant, 7 ~ 10°" J-s, is extremely small on the human scale of things, it still allows for 
particle localization in a very small volume even if the momentum spread in a wave packet is also small 
on that scale. For example, according to Eq. (35), a 0.1% spread of momentum of a 1 keV electron (p ~ 
1.7x10* kg-m/s) allows its wave packet to be as small as ~3x10°'° m. (For a heavier particle such as a 
proton, the packet would be even tighter.) As a result, wave packets may be used to describe the 
particles that are quite point-like from the macroscopic point of view. 


In a nutshell, this is the main idea of wave mechanics, and the first part of this course (Chapters 
1-3) will be essentially a discussion of various effects described by this approach. During this 
discussion, however, we will not only witness wave mechanics’ many triumphs within its applicability 
domain but also gradually accumulate evidence for its handicaps, which will force an eventual transfer 
to a more general formalism — to be discussed in Chapter 4 and beyond. 


1.3. Postulates’ discussion 


The wave mechanics’ postulates listed in the previous section (hopefully, familiar to the reader 
from their undergraduate studies) may look very simple. However, the physics of these axioms is very 
deep, leading to some counter-intuitive conclusions, and their in-depth discussion requires solutions of 
several key problems of wave mechanics. This is why in this section I will give only an initial, 
admittedly superficial discussion of the postulates, and will be repeatedly returning to the conceptual 
foundations of quantum mechanics throughout the course, especially in Chapter 10. 


First of all, the fundamental uncertainty of observables, which is in the core of the first postulate, 
is very foreign to the basic ideas of classical mechanics, and historically has made quantum mechanics 
so hard to swallow for many star physicists, notably including Albert Einstein — despite his 1905 work, 
which essentially launched the whole field! However, this fact has been confirmed by numerous 
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experiments, and (more importantly) there has not been a single confirmed experiment that would 
contradict this postulate, so that quantum mechanics was long ago promoted from a theoretical 
hypothesis to the rank of a reliable scientific theory. 


One more remark in this context is that Eq. (25) itself is deterministic, i.e. conceptually enables 
an exact calculation of the wavefunction’s distribution in space at any instant ¢, provided that its initial 
distribution, and the particle’s Hamiltonian, are known exactly. Note that in the classical statistical 
mechanics, the probability density distribution w(r, ft) may be also calculated from deterministic 
differential equations, for example, the Liouville equation.*! The quantum-mechanical description 
differs from that situation in two important aspects. First, in the perfect conditions outlined above (the 
best possible initial state preparation and measurements), the Liouville equation is reduced to the 2" 
Newton law of classical mechanics, i.e. the statistical uncertainty of its results disappears. In quantum 
mechanics this is not true: the quantum uncertainly, such as described by Eq. (35), persists even in this 
limit. Second, the wavefunction ‘Y(r, f) gives more information than just w(r, ¢), because besides the 
modulus of ‘¥, involved in Eq. (22), this complex function also has the phase g = arg'¥, which may 
affect some observables, describing, in particular, interference of the de Broglie waves. 


Next, it is very important to understand that the relation between the quantum mechanics and 
experiment, given by the second postulate, necessarily involves another key notion: that of the 
corresponding statistical ensemble, in this case, a set of many experiments carried out at apparently 
(macroscopically) similar conditions, including the initial conditions — which nevertheless may lead to 
different measurement results (outcomes). Indeed, the probability of a certain (n") outcome of an 
experiment may be only defined for a certain statistical ensemble, as the limit 


(1.36) 


where M is the total number of experiments, M,, is the number of outcomes of the 1" type, and N is the 
number of different outcomes. 


Note that a particular choice of statistical ensemble may affect probabilities W, very 
significantly. For example, if we pull out playing cards at random from a standard pack of 52 different 
cards of 4 suits, the probability W,, of getting a certain card (e.g., the queen of spades) is 1/52. However, 
if the cards of a certain suit (say, hearts) had been taken out from the pack in advance, the probability of 
getting the queen of spades is higher, 1/39. It is important that we would also get the last number for the 
probability even if we had used the full 52-card pack, but by some reason discarded results of all 
experiments giving us any rank of hearts. Hence, the ensemble definition (or its redefinition in the 
middle of the game) may change outcome probabilities. 


In wave mechanics, with its fundamental relation (22) between w and 'P, this means that not only 
the outcome probabilities, but the wavefunction itself also may depend on the statistical ensemble we 
are using, i.e. not only on the preparation of the system and the experimental setup, but also on the 
subset of outcomes taken into account. The sometimes accounted attribution of the wavefunction to a 
single experiment, both before and after the measurement, may lead to very unphysical interpretations of 
the results, including some wavefunction’s evolution which is not described by the Schrédinger equation 
(the so-called wave packet reduction), subluminal action on distance, etc. Later in the course, we will 
see that minding the fundamentally statistical nature of quantum mechanics, and in particular the 


31 See, e.g., SM Sec. 6.1. 
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dependence of wavefunctions on the statistical ensembles’ definition (or redefinition), readily resolves 
some, though not all, paradoxes of quantum measurements. 


Note, however, again that the standard quantum mechanics, as discussed in Chapters 1-6 of this 
course, is limited to statistical ensembles with the least possible uncertainty of the considered systems, 
i.e. with the best possible knowledge of their state.32 This condition requires, first, the least uncertain 
initial preparation of the system, and second, its total isolation from the rest of the world, or at least from 
its disordered part (the “environment’), in the course of its evolution in time. Only such ensembles may 
be described by certain wavefunctions. A detailed discussion of more general ensembles, which are 
necessary if these conditions are not satisfied, will be given in Chapters 7, 8, and 10. 


Finally, regarding Eq. (23): a better feeling of this definition may be obtained by its comparison 
with the general definition of the expectation value (i.e. the statistical average) in the probability theory. 
Namely, let each of N possible outcomes in a set of M experiments give a certain value A, of observable 
A; then 


(1.37) 


Taking into account Eq. (22), which relates W and 'Y, the structures of Eq. (23) and the final form of Eq. 
(37) are similar. Their exact relation will be further discussed in Sec. 4.1. 


1.4. Continuity equation 


The wave mechanics postulates survive one more sanity check: they satisfy the natural 
requirement that the particle does not appear or vanish in the course of the quantum evolution.?3 Indeed, 
let us use Eq. (22b) to calculate the rate of change of the probability W to find a particle within a certain 
volume J: 


wy dy. 1.38 
aE = ae (1.38) 


Assuming for simplicity that the boundaries of the volume V do not move, it is sufficient to carry out the 
partial differentiation of the product ‘P‘¥* inside the integral. Using the Schrédinger equation (25), 
together with its complex conjugate, 

* 


ov 
ot 


—ih 


=(HY)", (1.39) 


we readily get 


= [a(we" jar 
dt <0ot 


ily why yor d’r == jv Gav)-vlaey Jar (1.40) 


V 


32 The reader should not be surprised by the use of the notion of “knowledge” (or “information”) in this context. 
Indeed, due to the statistical character of experiment outcomes, quantum mechanics (or at least its relation to 
experiment) is intimately related to information theory. In contrast to much of classical physics, which may be 
discussed without any reference to information, in quantum mechanics, as in classical statistical physics, such 
abstraction is possible only in some very special (and not the most interesting) cases. 

33 Note that this requirement may be violated in the relativistic quantum theory — see Chapter 9. 
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Let the particle move in a field of external forces (not necessarily constant in time), so that its 
classical Hamiltonian function H is the sum of the particle’s kinetic energy T = p’/2m and its potential 
energy U(r, t).34 According to the correspondence principle, and Eq. (27), the Hamiltonian operator may 
be represented as the sum3> 


2 
+(e) =-5_V" +U(r,t). (1.41) 
m 


m 


At this stage, we should notice that this operator, when acting on a real function, returns a real 
function.3° Hence, the result of its action on an arbitrary complex function ‘¥ = a + ib (where a and b are 
real) is 

HY = H(a+ib) = Ha+iHb, (1.42) 


where Ha and Hb are also real, while 
(HY)* =(Ha+iHb)* = Ha —-iHb = H(a—ib) = HY’. (1.43) 


This means that Eq. (40) may be rewritten as 


2 
ae [way vA" a'r = [wivty -wyey" lar, (1.44) 
dt ih» 2m ih +, 
Now let us use general rules of vector calculus?’ to write the following identity: 

v-(v'vy-wvy" | =P Vy py py”, (1.45) 

A comparison of Eqs. (44) and (45) shows that we may write 

dw “38 
—=-|(V-jdr, 1.46 
J (Vj) (1.46) 


where the vector j is defined as 


A ih * h * 
= (vy - cc. = “in{ ¥ vy), (1.47) 
2m m 


where c.c. means the complex conjugate of the previous expression — in this case, (VV'P*)*, i.e. P* VY. 
Now using the well-known divergence theorem,?* Eq. (46) may be rewritten as the continuity equation 


Oe 0. with =| j,d°r, (1.48) 
S 


dt 


34 As a reminder, such description is valid not only for conservative forces (in that case U has to be time- 
independent), but also for any force F(r, ¢) that may be expressed via the gradient of U(r, f) — see, e.g., CM 
Chapters 2 and 10. (A good example when such a description is impossible is given by the magnetic component 
of the Lorentz force — see, e.g., EM Sec. 9.7, and also Sec. 3.1 below.) 

35 Historically, this was the main step made (in 1926) by E. Schrédinger on the background of L. de Broglie’s 
idea. The probabilistic interpretation of the wavefunction was put forward, almost simultaneously, by M. Born. 

36 In Chapter 4, we will discuss a more general family of Hermitian operators, which have this property. 

37 See, e.g., MA Eq. (11.4a), combined with the del operator’s definition V = V-V. 

38 See, e.g., MA Eq. (12.2). 
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where j,, is the component of the vector j, along the outwardly directed normal to the closed surface S 
that limits the volume J, i.e. the scalar product j-n, where n is the unit vector along this normal. 


Equalities (47) and (48) show that if the wavefunction on the surface vanishes, the total 
probability W of finding the particle within the volume does not change, providing the intended sanity 
check. In the general case, Eq. (48) says that dW/dt equals the flux J of the vector j through the surface, 
with the minus sign. It is clear that this vector may be interpreted as the probability current density — 
and J, as the total probability current through the surface S. This interpretation may be further supported 
by rewriting Eq. (47) for the wavefunction represented in the polar form = ae’’, with real a and g: 


ja Avo. (1.49) 


Note that for a real wavefunction, or even for a wavefunction with an arbitrary but space-constant phase 
gy, the probability current density vanishes. On the contrary, for the traveling wave (29), with a constant 
probability density w = a, Eq. (49) yields a non-zero (and physically very transparent) result: 


(<9 key, (1.50) 
m m 
where v = p/m is particle’s velocity. If multiplied by the particle’s mass m, the probability density w 
turns into the (average) mass density p, and the probability current density, into the mass flux density pv. 
Similarly, if multiplied by the total electric charge q of the particle, with w turning into the charge 
density o, j becomes the electric current density. As the reader (hopefully :-) knows, both these currents 
satisfy classical continuity equations similar to Eq. (48).3? 


Finally, let us recast the continuity equation, rewriting Eq. (46) as 


(Sv. i|rr=o. (1.51) 
Ot 


Now we may argue that this equality may be true for any choice of the volume V only if the expression 
under the integral vanishes everywhere, 1.e. if 


Ow 
—4+V-j=0. 1.52 
s j (1.52) 


This differential form of the continuity equation may be more convenient than its integral form (48). 


1.5. Eigenstates and eigenvalues 


Now let us discuss the most important corollaries of wave mechanics’ /Jinearity. First of all, it 
uses only linear operators. This term means that the operators must obey the following two rules:4? 


(4,+4,)¥ = 4.v+ 4,¥, (1.53) 


39 See, e.g., respectively, CM 8.3 and EM Sec. 4.1. 

40 By the way, if any equality involving operators is valid for an arbitrary wavefunction, the latter is frequently 
dropped from notation, resulting in an operator equality. In particular, Eq. (53) may be readily used to prove that 
the linear operators are commutative: A, <A, =A, + A,> and associative: A, + A, + A, =A, + A, + A, 
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A(c,¥, +¢,¥,)= A(c,¥, )+ A(c,¥, )=¢, AP, +¢,4¥,, (1.54) 


where ’,, are arbitrary wavefunctions, while c, are arbitrary constants (in quantum mechanics, 
frequently called c-numbers, to distinguish them from operators and wavefunctions). The most 
important examples of linear operators are given by: 


(i) the multiplication by a function, such as for the operator r given by Eq. (26), and 
(11) the spatial or temporal differentiation, such as in Eqs. (25)-(27). 


Next, it is of key importance that the Schrédinger equation (25) is also linear. (This fact was 
already used in the discussion of wave packets in the last section.) This means that if each of several 
functions ‘Y,, are (particular) solutions of Eq. (25) with a certain Hamiltonian, then their arbitrary linear 
combination, 


we) 6¥.: (1.55) 
is also a solution of the same equation.*! 


Let us use the linearity to accomplish an apparently impossible feat: immediately find the 
general solution of the Schrédinger equation for the most important case when the system’s 
Hamiltonian does not depend on time explicitly — for example, like in Eq. (41) with time-independent 
potential energy U = U(r), when the Schrédinger equation has the form 


OV h? 


ih— = VY +U(ry. (1.56) 


Gt = 2m 


First of all, let us prove that the following product, 


Y, =a,()y,(r), (1.57) 


qualifies as a (particular) solution of such an equation. Indeed, plugging Eq. (57) into Eq. (25) with any 
time-independent Hamiltonian, using the fact that in this case 


Ha,(t)y, (r) =a,(t)Hy,(r), (1.58) 
and dividing both parts of the equation by a,y,, we get 
ih da, a Hy, 
a, dt Vv, 


n 


(1.59) 


The left-hand side of this equation may depend only on time, while the right-hand side, only on 
coordinates. These facts may be only reconciled if we assume that each of these parts is equal to (the 
same) constant of the dimension of energy, which I will denote as £,,.47 As a result, we are getting two 
separate equations for the temporal and spatial parts of the wavefunction: 


41 At the first glance, it may seem strange that the /inear Schrédinger equation correctly describes quantum 
properties of systems whose classical dynamics is described by nonlinear equations of motion (e.g., an 
anharmonic oscillator — see, e.g., CM Sec. 5.2). Note, however, that statistical equations of classical dynamics 
(see, e.g., SM Chapters 5 and 6) also have this property, so it is not specific to quantum mechanics. 

42 This argumentation, leading to variable separation, is very common in mathematical physics — see, e.g., its 
discussion in CM Sec. 6.5, and in EM Sec. 2.5 and beyond. 
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Hy, =EW,; (1.60) 
iF g,. (1.61a) 
dt 
The latter of these equations, rewritten in the form 
d E 
On = jm dy (1.61b) 
a h 


is readily integrable, giving 


. E 
Ina, =—ia@,t + const, so that a, = const x exp{-ia,t}, with, = ct (1.62) 


Now plugging Eqs. (57) and (62) into Eq. (22), we see that in the quantum state described by Eqs. (57)- 
(62), the probability w of finding the particle at a certain location does not depend on time: 


wey, (r)y,(r)=w(r). (1.63) 


With the same substitution, Eq. (23) shows that the expectation value of any operator that does not 
depend on time explicitly is also time-independent: 


(A) = CAG! y,,(r)d*r= const. (1.64) 


Due to this property, the states described by Eqs. (57)-(62) are called stationary; they are fully 
defined by the possible solutions of the stationary (or “time-independent”) Schrédinger equation (60).*3 
Note that for the time-independent Hamiltonian (41), the stationary Schrédinger equation (60), 


he? 
-—V’y, +U(r)y,, = EW n> (1.65) 
2m 


is a linear, homogeneous differential equation for the function y,, with a priory unknown parameter E,,. 
Such equations fall into the mathematical category of eigenproblems,“ whose eigenfunctions y, and 
eigenvalues E,, should be found simultaneously, i.e. self-consistently.*> 


Mathematics*¢ tells us that for such equations with space-confined eigenfunctions y,, tending to 
zero at r — o, the spectrum of eigenvalues is discrete. It also proves that the eigenfunctions 
corresponding to different eigenvalues are orthogonal, i.e. that space integrals of the products y,w,* 
vanish for all pairs with n #n’. Due to the Schrédinger equation’s linearity, each of these functions may 
be multiplied by a proper constant coefficient to make their set orthonormal: 


l, forn=n', 


A (1.66) 
0, forn#n’. 


43 In contrast, the full Schrédinger equation (25) is frequently called time-dependent or non-stationary. 

44 From the German root eigen, meaning “particular” or “characteristic”. 

45 Eigenvalues of energy are frequently called eigenenergies, and it is often said that the eigenfunction y,, and the 
corresponding eigenenergy £,, together determine the n" stationary eigenstate of the system. 

46 See, e.g., Sec. 9.3 of the wonderful handbook by G. Korn and T. Korn, listed in MA Sec. 16(ii). 
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Moreover, the eigenfunctions y,(r) form a full set, meaning that an arbitrary function y(r), in particular 
the actual wavefunction 'Y of the system in the initial moment of its evolution (which I will always, with 
a few clearly marked exceptions, take for t = 0) may be represented as a unique expansion over the 
eigenfunction set: 


W(r,0) = >oc,y, (0). (1.67) 


The expansion coefficients c, may be readily found by multiplying both sides of Eq. (67) by y%*,., 
integrating the results over the space, and using Eq. (66). The result is 


c= } yw. (ry ¥(r,0)d?r. (1.68) 
Now let us consider the following wavefunction 


Prt) = dic,a, Ov. 0) = dc, exp -/t1| y, (1). (1.69) 


Since each term of the sum has the form (57) and satisfies the Schrédinger equation, so does the sum as 
the whole. Moreover, if the coefficients c, are derived in accordance with Eq. (68), then the solution 
(69) satisfies the initial conditions as well. At this moment we can use one more bit of help from 
mathematicians, who tell us that the linear, partial differential equation of type (65), with fixed initial 
conditions, may have only one (unique) solution. This means that in our case of time-independent 
potential Hamiltonian, Eq. (69) gives the general solution of the Schrédinger equation (25). 


So, we have succeeded in our apparently over-ambitious goal. Now let us pause this mad 
mathematical dash for a minute, and discuss this key result. 


1.6. Time evolution 


For the time-dependent factor, a,(f), of each component (57) of the general solution (69), our 
procedure gave a very simple and universal result (62), describing a linear change of the phase 9g, = 
arg(a,) of this complex function in time, with the constant rate 

i (1.70) 

dt h 
so that the real and imaginary parts of a, oscillate sinusoidally with this frequency. The relation (70) 
coincides with Einstein’s conjecture (5), but could these oscillations of the wavefunctions represent a 
physical reality? Indeed, for photons, described by Eq. (5), E may be (and as we will see in Chapter 9, 
is) the actual, well-defined energy of one photon, and @ is the frequency of the radiation so quantized. 
However, for non-relativistic particles, described by wave mechanics, the potential energy U, and hence 
the full energy £, are defined to an arbitrary constant, because we may measure them from an arbitrary 
reference level. How can such a change of the energy reference level (which may be made just in our 
mind) alter the frequency of oscillations of a variable? 


According to Eqs. (22)-(23), this time evolution of a wavefunction does not affect the particle’s 
probability distribution, or even any observable (including the energy E£, provided that it is always 
referred to the same origin as U), in any stationary state. However, let us combine Eq. (5) with Bohr’s 
assumption (7): 
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ho,,, =E,,—E,. (1.71) 


The difference @m’ of the eigenfrequencies @, and @,’, participating in this formula, is evidently 
independent of the energy reference, and as will be proved later in the course, determines the 
measurable frequency of the electromagnetic radiation (or possibly of a wave of a different physical 
nature) emitted or absorbed at the quantum transition between the states. 


As another but related example, consider two similar particles 1 and 2, each in the same (say, the 
lowest-energy) eigenstate, but with their potential energies (and hence the ground state energies EF) 2) 
different by a constant AU = U; — U>. Then, according to Eq. (70), the difference g = g — g of their 
wavefunction phases evolves in time with the reference-independent rate 


cL aa 
dt h 
Certain measurement instruments, weakly coupled to the particles, may allow observation of this 
evolution, while keeping the particle’s quantum dynamics virtually unperturbed, i.e. Eq. (70) intact. 


Perhaps the most dramatic measurement of this type is possible using the Josephson effect in weak links 
between two superconductors — see Fig. 7.47 


(1.72) 


I< sin(Q, —Q) 


V Fig. 1.7. The Josephson effect in a weak link 
between two bulk superconductor electrodes. 


As a brief reminder,*® superconductivity may be explained by a specific coupling between 
conduction electrons in solids, that leads, at low temperatures, to the formation of the so-called Cooper 
pairs. Such pairs, each consisting of two electrons with opposite spins and momenta, behave as Bose 
particles and form a coherent Bose-Einstein condensate.*? Most properties of such a condensate may be 
described by a single, common wavefunction Y, evolving in time just as that of a free particle, with the 
effective potential energy U = q¢=—2e¢, where ¢ is the electrochemical potential,°° and g = —2e is the 
electric charge of a Cooper pair. As a result, for the system shown in Fig. 7, in which externally applied 
voltage V fixes the difference ¢, — ¢ between the electrochemical potentials of two superconductors, 


Eq. (72) takes the form 
OO Pky. (1.73) 
dt h 


If the link between the superconductors is weak enough, the electric current J of the Cooper pairs (called 
the supercurrent) through the link may be approximately described by the following simple relation, 


47 The effect was predicted in 1962 by Brian Josephson (then a graduate student!) and observed soon after that. 
48 For a more detailed discussion, including the derivation of Eq. (75), see e.g. EM Chapter 6. 

49 A detailed discussion of the Bose-Einstein condensation may be found, e.g., in SM Sec. 3.4. 

50 For more on this notion see, e.g. SM Sec. 6.3. 
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where J, is some constant, dependent on the weak link’s strength.°! Now combining Eqs. (73) and (74), 
we see that if the applied voltage V is constant in time, the current oscillates sinusoidally, with the so- 
called Josephson frequency 


asap (1.75) 


as high as ~484 MHz per microvolt of applied de voltage. This effect may be readily observed 
experimentally: though its direct detection is a bit tricky, it is easy to observe the phase locking 
(synchronization)>? of the Josephson oscillations by an external microwave signal of frequency @. Such 
phase locking results in the relation @; = nq fulfilled within certain de current intervals, and hence in the 
formation, on the weak link’s de /-V curve, of virtually vertical current steps at dc voltages 


Vo =n—, (1.76) 


where n is an integer.>3 Since frequencies may be stabilized and measured with very high precision, this 
effect is being used in highly accurate standards of dc voltage. 


1.7. Spatial dependence 


In contrast to the simple and universal time dependence (62) of the stationary states, the spatial 
distributions of their wavefunction y,(r) need to be calculated from the problem-specific stationary 
Schrédinger equation (65). The solution of this equation for various particular cases is a major focus of 
the next two chapters. For now, let us consider just the simplest example, which nevertheless will be the 
basis for our discussion of more complex problems: let a particle be confined inside a rectangular hard- 
wall box. Such confinement may be described by the following potential energy profile:*4 


0, forO<x<a,, O<y<a,, and0<z<a,, 
. : (1.77) 
+o, otherwise. 


U(r) = | 


The only way to keep the product U(r) y, in Eq. (65) finite outside the box, is to have y= 0 in 
these regions. Also, the function has to be continuous everywhere, to avoid the divergence of the 


5! In some cases, the function [(yv) may somewhat deviate from Eq. (74), but these deviations do not affect its 
fundamental 27-periodicity, and hence the fundamental relations (75)-(76). (No corrections to them have been 
found yet.) 

52 For the discussion of this very general effect, see, e.g., CM Sec. 5.4. 

53 The size of these dc current steps may be readily calculated from Eqs. (73) and (74). Let me leave this task for 
the reader’s exercise. 

54 Another common name for such potentials, especially of lower dimensionality, is the potential well, in our 
current case “rectangular” one: with a flat “bottom” and vertical, infinitely high “walls”. Note that sometimes, 
very unfortunately, such potential profiles are called “quantum wells”. (This term seems to imply that the 
particle’s confinement in such a well is a phenomenon specific for quantum mechanics. However, as we will 
repeatedly see in this course, the opposite is true: quantum effects do as much as they only can to overcome the 
particle’s confinement in a potential well, letting it partly penetrate in the “classically forbidden” regions beyond 
the well’s walls.) 
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kinetic-energy term (-h’/2m)V’ y,. Hence, in this case we may solve the stationary Schrédinger equation 
(60) just inside the box, i.e. with U= 0, so that it takes a simple form 
2 


say, =EW,> (1.78a) 
2m 


with zero boundary conditions on all the walls.*> For our particular geometry, it is natural to express the 
Laplace operator in the Cartesian coordinates {x, y, z} aligned with the box sides, with the origin at one 
of the comers of its rectangular a,xa,xa, volume, so that our boundary problem becomes: 


Wete:°0 50e 
ee dy* dz? 
with y, =0 for: x=0 anda,; y=0anda,; z=0 anda,. 


y, =Ew,, for0<x<a,, O<y<a,, and0<z<a.,, 
: (1.78b) 


This problem may be readily solved using the same variable separation method as was used in 
Sec. 5 — now to separate the Cartesian spatial variables from each other, by looking for a partial 
solution of Eq. (78) in the form 
wr) = X(x)¥(y)Z(z). (1.79) 


(Let us postpone assigning function indices for a minute.) Plugging this expression into Eq. (78b) and 
dividing all terms by the product XYZ, we get 
hoax Aaya Vee. 
2m X dx? mY dy? 2mZ dz’ 


(1.80) 


Now let us repeat the standard argumentation of the variable separation method: since each term on the 
left-hand side of this equation may be only a function of the corresponding argument, the equality is 
possible only if each of them is a constant — in our case, with the dimensionality of energy. Calling these 
constants F, etc., we get three similar 1D equations 


es -eeeee a eee, (1.81) 


with Eq. (80) turning into the following energy-matching condition: 
E, +k, +#, =E. (1.82) 
All three ordinary differential equations (81), and their solutions, are similar. For example, for 
X(x), we have the following 1D Helmholtz equation 
d°X 


2 
X 


re sare. (1.83) 


he 


and simple boundary conditions: X(0) = X(a,) = 0. Let me hope that the reader knows how to solve this 
well-known 1D boundary problem — describing, for example, the usual mechanical waves on a guitar 
string. The problem allows an infinite number of sinusoidal standing-wave eigenfunctions,*° 


55 Rewritten as V’>f+ f= 0, Eq. (78a) is just the Helmholtz equation, which describes waves of any nature (with 
the wave vector k) in a uniform, isotropic, linear medium — see, e.g., EM Secs. 7.5-7.9 and 8.5. 
56 The front coefficient in the last expression for X ensures the (ortho)normality condition (66). 
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7m 3) mx Rectangular 
Xocsink,x, with k,=—*, sothat X =| — with n, =1, 2,...,| (1.84) sprertial welt: 
a, : 2 functions 


corresponding to the following eigenenergies: 
rh? Fi 


hi? 
-=;. k= oma? n-=E ne. (1.85) 


E 


Figure 8 shows these simple results, using a somewhat odd but very graphic and common 
representation, in that the eigenenergy values (frequently called the energy levels) are used as horizontal 
axes for plotting the eigenfunctions — despite their completely different dimensionality. 


E/E, 
X (x) 


Fig. 1.8. The lowest eigenfunctions (solid lines) and 
eigenvalues (dashed lines) of Eq. (83) for a potential well 
of length a,. Solid black lines show the effective potential 
energy profile for this 1D eigenproblem. 


Due to the similarity of all Eqs. (81), Y(v) and Z(z) are absolutely similar functions of their 
arguments, and may also be numbered by integers (say, n, and n-) independent of n,, so that the 
spectrum of values of the total energy (82) is 


Rectangular 
al .86) potential well: 


energy levels 


Thus, in this 3D problem, the role of the index n in the general Eq. (69) is played by a set of three 
independent integers {n,, n,, n-}. In quantum mechanics, such integers play a key role and thus have a 
special name, the quantum numbers. Using them, that general solution, for our current simple problem 


may be represented as the sum 


Rectangular 
potential 
well: 
general 
solution 


with the front coefficients that may be readily calculated from the initial wavefunction ‘P(r, 0), using 
Eq. (68) — again with the replacement n > {nx, ny, nz}. 


This simplest problem is a good illustration of typical results the wave mechanics gives for 
spatially-confined motion, including the discrete energy spectrum, and (in this case, evidently) 
orthogonal eigenfunctions. Perhaps most importantly, its solution shows that the lowest value of the 
particle’s kinetic energy (86), reached in the so-called ground state (in our case, the state with n, =n, = 
nz = 1) is above zero for any finite size of the confining box. 
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An example of the opposite case of a continuous spectrum for the unconfined motion of a free 
particle is given by the plane waves (29). With the account of relations E = fiw and p = hk, such 
wavefunction may be viewed as the product of the time-dependent factor (62) by the eigenfunction, 


V, = 4, exp{ik-r}, (1.88) 


which is the solution of the stationary Schrédinger equation (78a) if it is valid in the whole space.>” The 
reader should not be worried too much by the fact that the fundamental solution (86) in free space is a 
traveling wave (having, in particular, a non-zero value of the probability current j), while those inside a 
quantum box are standing waves, with j = 0, even though the free space may be legitimately considered 
as the ultimate limit of a quantum box with volume V = a,xa,xa;, — ©. Indeed, due to the linearity of 
wave mechanics, two traveling-wave solutions (88) with equal and opposite values of the momentum 
(and hence with the same energy) may be readily combined to give a standing-wave solution,>® for 
example, exp{ik-r} + exp{-ik-r} = 2cos(k-r), with the net current j = 0. Thus, depending on the 
convenience for a particular problem, we may represent its general solution as a sum of either traveling- 
wave or standing-wave eigenfunctions. Since in the unlimited free space, there are no boundary 
conditions to satisfy, the Cartesian components of the wave vector k in Eq. (88) can take any real 
values. (This is why it is more convenient to label these wavefunctions, and the corresponding 
eigenenergies, 


i= >0, (1.89) 
2m 


with their wave vector k rather than an integer index.) 


However, one aspect of continuous-spectrum systems requires a bit more caution with 
mathematics: the summation (69) should be replaced by the integration over a continuous index or 
indices — in our current case, the three Cartesian components of the vector k. The main rule of such 
replacement may be readily extracted from Eq. (84): according to this relation, for standing-wave 
solutions, the eigenvalues of k, are equidistant, i.e. separated by equal intervals Ak, = z/a,, with similar 
relations for other two Cartesian components of vector k. Hence the number of different eigenvalues of 
the standing-wave vector k (with k,, k,, kz 2 0), within a volume d’k >> 1/V of the k space is dN = 
AkI(Ak,AkAk,) = (Vi2)a’k. Frequently, it is more convenient to work with traveling waves (88); in this 
case we should take into account that, as was just discussed, there are two different traveling wave 
numbers (say, +k, and —k,) corresponding to each standing wave vector’s k, > 0. Hence the same number 
of physically different states corresponds to a 2° = 8-fold larger k space or, equivalently, to an 8-fold 
smaller number of states per unit volume d°k: 


(1.90) 


57 In some systems (e.g., a particle interacting with a potential well of a finite depth), a discrete energy spectrum 
within a certain energy interval may coexist with a continuous spectrum in a complementary interval. However, 
the conceptual philosophy of eigenfunctions and eigenvalues remains the same even in this case. 

58 This is, of course, the general property of waves of any physical nature, propagating in a linear medium — see, 
e.g., CM Sec. 6.5 and/or EM Sec. 7.3. 
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For dN >> 1, this expression is independent of the boundary conditions, and is frequently 
represented as the following summation rule 


Summation 
tty pg ELH) = J [HUN = BF 1000", 191) Ste 


where /f(k) is an arbitrary function of k. Note that if the same wave vector k corresponds to several 
internal quantum states (such as spin — see Chapter 4), the right-hand side of Eq. (91) requires its 
multiplication by the corresponding degeneracy factor of orbital states.*° 


1.8. Dimensionality reduction 


To conclude this introductory chapter, let me discuss the conditions when the spatial 
dimensionality of a wave-mechanical problem may be reduced.®® Naively, one may think that if the 
particle’s potential energy depends on just one spatial coordinate, say U = U(x, 0), then its wavefunction 
has to be one-dimensional as well: y= y(x, ft). Our discussion of the particular case U = const in the 
previous section shows that this assumption is wrong. Indeed, though this potential®! is just a special 
case of the potential U(x, f), most of its eigenfunctions, given by Eqs. (87) or (88), do depend on the 
other two coordinates. This is why the solutions y(x, ft) of the 1D Schrédinger equation 
1D time- 


aE TUAY, (1.92) Schrodinger 
m Ox equation 


which follows from Eq. (65) by assuming 0V/dy = oOW/dz = 0, are insufficient to form the general 
solution of Eq. (65) for this case. 


This fact is easy to understand physically for the simplest case of a stationary 1D potential: U = 
U(x). The absence of the y- and z-dependence of the potential energy U may be interpreted as a potential 
well that is flat in two directions, y and z. Repeating the arguments of the previous section for this case, 
we see that the eigenfunctions of a particle in such a well have the form 


y(r)= X(x)expiilk,y+k.z)}, (1.93) 
where X(x) is an eigenfunction of the following stationary 1D Schrédinger equation: 
n> d°x 
ee eran. (1.94) 
2m dx 


where U.(x) is not the full potential energy of the particle, as it would follow from Eq. (92), but rather 
its effective value including the kinetic energy of the lateral motion: 


= aia 
Us =U +(E, +£,)=U+>(k) +42). (1.95) 


59 Such factor is similar to the front factor 2 in Eq. (1) for the number of electromagnetic wave modes, in that case 
describing two different polarizations of the waves with the same wave vector. 

60 Most textbooks on quantum mechanics jump to the formal solution of 1D problems without such discussion, 
ignoring the fact that such dimensionality restriction is adequate only under very specific conditions. 

6! Following tradition, I will frequently use this shorthand for “potential energy”, returning to the full term in 
cases where there is any chance of confusion of this notion with another (say, electrostatic) potential. 
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In plain English, the particle’s partial wavefunction X(x) and its full energy, depend on its transverse 
momenta, which have continuous spectrum — see the discussion of Eq. (89). This means that Eq. (92) is 
adequate only if the condition k, = k, = 0 is somehow enforced, and in most physical problems, it is not. 
For example, if a de Broglie (or any other) plane wave ‘P(x, f) is incident on a potential step, it would be 
reflected exactly back, i.e. with k, = k, = 0, only if the wall’s surface is perfectly plane and exactly 
normal to the axis x. Any imperfection (and they are so many of them in real physical systems —:) may 
cause excitation of waves with non-zero values of k, and k,, due to the continuous character of the 
functions £,(k,) and E(k,).© 


There is essentially one, perhaps counter-intuitive way to make the 1D solutions “robust” to 
small perturbations: it is to provide a rigid lateral confinement® in two other directions. As the simplest 
example, consider a narrow quantum wire (Fig. 9a), described by the following potential: 


U(x), for 0< y<a,,and 0<z<a,, 


U(r) -| (1.96) 


+00, otherwize. 


(a 


) (b) 
y 
= . <—_F 
* x 
Fig. 1.9. Partial confinement in: (a) two dimensions, and (b) one dimension. 


Performing the standard variable separation (79), we see that the corresponding stationary 
Schrédinger equation is satisfied if the partial wavefunction X(x) obeys Eqs. (94)-(95), but now with a 
discrete energy spectrum in the transverse directions: 


242 ( 2 
Spe oa gee), (1.97) 
2m \a, a; 
If the lateral confinement is tight, a,, a; + 0, then there is a large energy gap, 
242 
he i (1.98) 
2ma 


ye 
between the ground-state energy of the lateral motion (with n, =n, = 1) and that for all its excited states. 
As a result, if the particle is initially placed into the lateral ground state, and its energy E is much 
smaller than AU, it would stay in such state, i.e. may be described by a 1D Schrédinger equation similar 
to Eq. (92) — even in the time-dependent case, if the characteristic frequency of energy variations is 
much smaller than AU/h. Absolutely similarly, the strong lateral confinement in just one dimension (say, 
z, see Fig. 9b) enables systems with a robust 2D evolution of the particle’s wavefunction. 


62 This problem is not specific to quantum mechanics. The classical motion of a particle in a 1D potential may be 
also unstable with respect to lateral perturbations, especially if the potential is time-dependent, 1.e. capable of 
exciting low-energy lateral modes. 

63 The term “quantum confinement”, sometimes used to describe this phenomenon, is as unfortunate as the 
“quantum well” term discussed above, because of the same reason: the confinement is a purely classical effect, 
and as we will repeatedly see in this course, the quantum-mechanical effects reduce, rather than enable it. 
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The tight lateral confinement may ensure the dimensionality reduction even if the potential well 
is not exactly rectangular in the lateral direction(s), as described by Eq. (96), but is described by some x- 
and ¢-independent profile, if it still provides a sufficiently large energy gap AU. For example, many 2D 
quantum phenomena, such as the quantum Hall effect,°+ have been studied experimentally using 
electrons confined at semiconductor heterojunctions (e.g., epitaxial interfaces GaAs/Al,Ga;.,As), where 
the potential well in the direction perpendicular to the interface has a nearly triangular shape, and 
provides an energy gap AU of the order of 10° eV. This gap corresponds to kgT with J ~100 K, so that 
careful experimentation at liquid helium temperatures (4K and below) may keep the electrons 
performing purely 2D motion in the “lowest subband” (n, = 1). 


Finally, note that in systems with reduced dimensionality, Eq. (90) for the number of states at 
large k (1.e., for an essentially free particle motion) should be replaced accordingly: in a 2D system of 
area A >> L/K’, 


(1.99) 


while in a 1D system of length / >> 1/k, 
dN =—dk, (1.100) 


with the corresponding changes of the summation rule (91). This change has important implications for 
the density of states on the energy scale, dN/dE: it is straightforward (and hence left for the reader :-) to 
use Eqs. (90), (99), and (100) to show that for free 3D particles the density increases with EF 
(proportionally to E'*), for free 2D particles it does not depend on energy at all, while for free 1D 
particles it scales as E''”, i.e. decreases with energy. 


1.9. Exercise problems 


1.1. The actual postulate made by N. Bohr in his original 1913 paper was not directly Eq. (8), but 
the assumption that at quantum leaps between adjacent large (quasiclassical) orbits with n >> 1, the 
hydrogen atom either emits or absorbs energy AE = ha, where a is its classical radiation frequency — 
according to classical electrodynamics, equal to the angular velocity of electron’s rotation.®% Prove that 
this postulate is indeed compatible with Eqs. (7)-(8). 


1.2. Use Eq. (53) to prove that the linear operators of quantum mechanics are commutative: 


A, +A, = A + A,, and associative: (4, + 4,)+ 4, = 4, +(4, +4). 


1.3. Prove that for any time-independent Hamiltonian operator Hand two arbitrary complex 
functions f(r) and g(r), 


| f@)He(r)a°r = [Af (e)e(e)a’r. 


64 To be discussed in Sec. 3.2. 
65 See, e.g., P. Harrison, Quantum Wells, Wires, and Dots, 3" ed., Wiley, 2010. 
66 See, e.g., EM Sec. 8.2. 
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1.4. Prove that the Schrédinger equation (25) with the Hamiltonian operator given by Eq. (41), is 
Galilean form-invariant, provided that the wavefunction is transformed as 


mv-r mvt 
Ww’ bg ‘4 = W . . 
(r ,t') (een 1 ; +1 h } 


where the prime sign marks the variables measured in the reference frame 0’ that moves, without 
rotation, with a constant velocity v relatively to the “lab” frame 0. Give a physical interpretation of this 
transformation. 


1.5.” Prove the so-called Hellmann-Feynman theorem:® 

OE, /0H 

oA ae 

where 4 is some c-number parameter, on which the time-independent Hamiltonian H , and hence its 
eigenenergies E,,, depend. 


1.6. Use Eqs. (73) and (74) to analyze the effect of phase locking of Josephson oscillations on 
the dc current flowing through a weak link between two superconductors (frequently called the 
Josephson junction), assuming that an external source applies to the junction a sinusoidal ac voltage 
with frequency @ and amplitude A. 


1.7. Calculate (x), (py), ax, and op, for the eigenstate {nx, ny, nz} of a particle in a rectangular 
hard-wall box described by Eq. (77), and compare the product dxdp, with the Heisenberg’s uncertainty 
relation. 


1.8. Looking at the lower (red) line in Fig. 8, it seems plausible that the 1D ground-state function 
(84) of the simple potential well (77) may be well approximated with an inverted quadratic parabola: 


Aish (x) = Cx (a, =: x) ? 


where C is a normalization constant. Explore how good this approximation is. 


1.9. A particle placed in a hard-wall rectangular box with sides a,, a,, and a-, is in its ground 
state. Calculate the average force acting on each face of the box. Can the forces be characterized by a 
certain pressure? 


1.10. A 1D quantum particle was initially in the ground state of a very deep, rectangular 
potential well of width a: 
0, for—a/2<x<+a/2, 


u00)=| 


+o, otherwise. 


67 Despite this common name, H. Hellmann (in 1937) and R. Feynman (in 1939) were not the first ones in the 
long list of physicists who had (apparently, independently) discovered this equality. Indeed, it has been traced 
back to a 1922 paper by W. Pauli, and was carefully proved by P. Giittinger in 1931. 
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At some instant, the well’s width is abruptly increased to a new value a’ > a, leaving the potential 
symmetric with respect to the point x = 0, and then left constant. Calculate the probability that after the 
change, the particle is still in the ground state of the system. 


1.11. At ¢=0, a 1D particle of mass m is placed into a hard-wall, flat-bottom potential well 


0, for 0<x<a, 


(8) =4 


in a 50/50 linear superposition of the lowest (ground) state and the first excited state. Calculate: 


+o, otherwise, 


(i) the normalized wavefunction ‘V(x, ¢) for arbitrary time ¢ > 0, and 
(ii) the time evolution of the expectation value (x) of the particle’s coordinate. 
1.12. Calculate the potential profiles U(x) for that the following wavefunctions, 
(i) Y= cexpt- i ibt}. and 
(ii) bs = cexp{-a|x|—ibt} 
(with real coefficients a > 0 and b), satisfy the 1D Schrédinger equation for a particle with mass m. For 


each case, calculate (x), (px), dx, and 6dp,, and compare the product dxdp, with the Heisenberg’s 
uncertainty relation. 


1.13. A 1D particle of mass m, moving in the field of a stationary potential U(x), has the 
following eigenfunction 
Cc 
(x)= 


> re) 
cosh Kx 


where C is the normalization constant, and « is a real constant. Calculate the function U(x) and the 
state’s eigenenergy E. 


1.14. Calculate the density dN/dE of traveling-wave quantum states in large rectangular potential 
wells of various dimensions: d= 1, 2, and 3. 


1.15.” Use the finite-difference method with steps a/2 and a/3 to find as many eigenenergies as 
possible for a 1D particle in the infinitely deep, hard-wall 1D potential well of width a. Compare the 
results with each other, and with the exact formula. 


68 You may like to start by reading about the finite-difference method — see, e.g., CM Sec. 8.5 or EM Sec. 2.11. 
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Chapter 2. 1D Wave Mechanics 


Even the simplest, 1D version of wave mechanics enables quantitative analysis of many important 
quantum-mechanical effects. The order of their discussion in this chapter is dictated mostly by 
mathematical convenience — going from the simplest potential profiles to more complex ones, so that we 
may build upon the previous results. However, I would advise the reader to focus more not on the math, 
but rather on the physics of the non-classical phenomena it describes, ranging from particle penetration 
into classically-forbidden regions, to quantum-mechanical tunneling, to the metastable state decay, to 
covalent bonding and quantum oscillations, to energy bands and gaps. 


2.1. Basic relations 


As was discussed at the end of Chapter 1, in several cases (in particular, at strong confinement 
within the [y, z] plane), the general (3D) Schrédinger equation may be reduced to its 1D version, similar 
to Eq. (1.92): 

# OV (x,t) _ h? 0°Y(x,t) 
Ot 2m ax? 


It is important, however, to remember that according to the discussion in Sec, 1.8, U(x, f) in this 
equation is generally effective potential energy, which may include the energy of the lateral motion, 
while ‘P(x, t) may be just one factor in the complete wavefunction ¥(x, t)y(y, z). If the transverse factor 
HU, Z) is normalized to 1, then the integration of Eq. (1.22a) over the 3D space within a segment [x), x2] 
gives the following probability to find the particle on this segment: 


+ U(x, (x,0). (2.1) 


W(t)= { P(x, 0 (x,t)dr . (2.2) 


xX; 


If the particle under analysis is definitely somewhere inside the system, the normalization of its 1D 
wavefunction ‘P(x, ft) is provided by extending integral (2) to the whole axis x: 


fone =1, where w(x,t)= P(x, oe (x,t). (2.3) 


A similar integration of Eq. (1.23) shows that the expectation value of any observable depending only 
on the coordinate x (and possibly time), may be expressed as 


(A)(t) = furos t)AW(x,t)dx. (2.4) 


It is also useful to introduce the notion of the probability current along the x-axis (a scalar): 


I(x,t) = | dk ade =m 0" 29)-* —|¥(x, 0) 


: oe (2.5) 


where /, 1s the x-component of the current density vector j(r,t). Then the continuity equation (1.48) for 
any segment [x1, x2] takes the form 


© K. Likharev 
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CLUE em ~I(x,)=0. (2.6) 
dt 


The above formulas are sufficient for analysis of 1D problems of wave mechanics, but before 
proceeding to particular cases, let me deliver on my earlier promise to prove that Heisenberg’s 
uncertainty relation (1.35) is indeed valid for any wavefunction (x, ¢). For that, let us consider the 
following positive (or at least non-negative) integral 


2 


dx >0, (2.7) 


+00 


J(a)= | 


—0O 


grag 
ox 


where J is an arbitrary real constant, and assume that at x — +oo the wavefunction vanishes, together 
with its first derivative — as we will see below, a very common case. Then the left-hand side of Eq. (7) 
may be recast as 


+00 


J(A)= | eet = ee iteazaes | wen) dx 
IX 
= i (2.8) 
Fooriesh: (ye. oy Javea (ees dx. 
nee oe x x "OX Ox 


According to Eq. (4), the first term in the last form of Eq. (8) is just (x°), while the second and the third 
integrals may be worked out by parts: 


fi ae ay" Jac= fr 2 (ow )ar= fralew)= ws 


ee [Ww de = -1, (2.9) 
(Sm, s. 


Ov OF” ye mae 2.10) 


ay” 
2 0h 308 Ox Ox 


As aresult, Eq. (7) takes the following form: 


P. : “(x 

Hay=(et)—a4 2 Ds >0, ie. #+al+b>0, with a= = , b= ( ) (2.11) 
a (ps) (Ps) 

This inequality should be valid for any real A, so that the corresponding quadratic equation, 27 + aA + b 


= 0, can have either one (degenerate) real root — or no real roots at all. This is only possible if its 
determinant, Det = a* — 4b, is non-positive, leading to the following requirement: 


(x? pr) i (2.12) 


In particular, if (x) = 0 and (p,) = 0,! then according to Eq. (1.33), Eq. (12) takes the form 


! Eq. (13) may be proved even if (x) and (p,) are not equal to zero, by making the following replacements: x > x — 
(x) and 0/0x — o/dx + i(p)/h in Eq. (7), and then repeating all the calculations — which in this case become 
somewhat bulky. In Chapter 4, equipped with the bra-ket formalism, we will derive a more general uncertainty 
relation, which includes the Heisenberg’s relation (13) as a particular case, in a more efficient way. 
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(2\(p2)2 (2.13) 


which, according to the definition (1.34) of the r.m.s. uncertainties, is equivalent to Eq. (1.35). 


Now let us notice that the Heisenberg’s uncertainty relation looks very similar to the 
commutation relation between the corresponding operators: 


Pe Wecress ie { in a inl 
IX 


IX 


Joo) =iht . (2.14a) 


Since this relation is valid for any wavefunction ‘P(x, f), it may be represented as an operator equality: 
[<, p, |=ih #0. (2.14b) 


In Sec. 4.5 we will see that the relation between Eqs. (13) and (14) is just a particular case of a general 
relation between the expectation values of non-commuting operators, and their commutators. 


2.2. Free particle: Wave packets 


Let us start our discussion of particular problems with the free 1D motion, i.e. with U(x, 1) = 0. 
From Eq. (1.29), it is evident that in the 1D case, a similar “fundamental” (i.e. a particular but the most 
important) solution of the Schrédinger equation (1) is a sinusoidal (“monochromatic”) wave 


YY, (x,t) = const x exp{i(kyx — @pt)}. (2.15) 


According to Eqs. (1.32), it describes a particle with definite momentum? po = fiko and energy Eo = hao 
= f’ky’/2m. However, for this wavefunction, the product WY" does not depend on either x or t, so that 
the particle is completely delocalized, i.e. the probability to find it the same along all axis x, at all times. 


In order to describe a space-localized state, let us form, at the initial moment of time (t= 0), a 
wave packet of the type shown in Fig. 1.6, by multiplying the sinusoidal waveform (15) by some smooth 
envelope function A(x). As the most important particular example, consider the Gaussian wave packet 


P(x,0) = A(x)e“", with A(x) (2.16) 


1 | x | 
= 1/4 12 ©XP)- 2(° 
(277) (6x) (26x) 


(By the way, Fig. 1.6a shows exactly such a packet.) The pre-exponential factor in this envelope 
function has been selected in the way to have the initial probability density, 


w(x,0) = W" (x,0)¥(x,0) = A (x)A(x) = oa ee ex sae | . (2.17) 


normalized as in Eq. (3), for any parameters dx and ko.3 


2 From this point on to the end of this chapter, I will drop the index x in the x-components of the vectors k and p. 

3 This fact may be readily proven using the well-known integral of the Gaussian function (17), in infinite limits — 
see, e.g., MA Eq. (6.9b). It is also straightforward to use MA Eq. (6.9c) to prove that for the wave packet (16), the 
parameter dx is indeed the r.m.s. uncertainty (1.34) of the coordinate x, thus justifying its notation. 
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To explore the evolution of this wave packet in time, we could try to solve Eq. (1) with the initial 
condition (16) directly, but in the spirit of the discussion in Sec. 1.5, it is easier to proceed differently. 
Let us first represent the initial wavefunction (16) as a sum (1.67) of the eigenfunctions y4(x) of the 
corresponding stationary 1D Schrédinger equation (1.60), in our current case 

2 2 27,2 
Fo ey, with £, 22%, (2.18) 
2m dx m 


which are simply monochromatic waves, 


y, =a,e™, (2.19) 


Since (as was discussed in Sec. 1.7) at the unconstrained motion the spectrum of possible wave numbers 
k is continuous, the sum (1.67) should be replaced with an integral:4 


Y(x,0) = | a,e!™* dk . (2.20) 


Now let us notice that from the point of view of mathematics, Eq. (20) is just the usual Fourier 
transform from the variable k to the “conjugate” variable x, and we can use the well-known formula of 
the reciprocal Fourier transform to write 


1 1 x 


I ~ikex aa ~ 
Perper a= Bene —ikxSdx, wherek =k—k,. (2.21 
a= 52 / Pe aes Gamay! =| (26x)? is} mee ers 


This Gaussian integral may be worked out by the following standard method, which will be used many 
times in this course. Let us complement the exponent to the full square of a linear combination of x and 
k, adding a compensating term independent of x: 


2: 
; Ee. ikx = oan" Lv + 21(6x)2k] -K2(6x)?. (2.22) 


Since the integration in the right-hand side of Eq. (21) should be performed at constant k , in the infinite 
limits of x, its result would not change if we replace dy by dx’ = d[x + 2i(dx) k ]. As a result, we get:> 


oe eee ee cy a 
SO On) (5x) exe (52) ff 00 anv} 
( 1 ) 1 | ke? | 
= 1/4 v2 ©XP\— 7 na ft? 
22) (2n)'* (6k) (26k) 


so that a, also has a Gaussian distribution, now along the k-axis, centered to the value ko (Fig. 1.6b), 
with the constant 6k defined as 


(2.23) 


4 For the notation brevity, from this point on the infinite limit signs will be dropped in all 1D integrals. 

> The fact that the argument’s shift is imaginary is not important. Indeed, since the function under the integral 
tends to zero at Re x’ = Re x > +o, the difference between infinite integrals of this function along axes of x and 
x’ is equal to its contour integral around the rectangular area x < Imz < x’. Since the function is also analytical, it 
obeys the Cauchy theorem MA Eq. (15.1), which says that this contour integral equals zero. 
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dk =1/26x. (2.24) 


Thus we may represent the initial wave packet (16) as 


= ! m 1 (k—-k,)° ikx 
v0) =() on Gp™ J es} OBE: r dk. (2.25) 


From the comparison of this formula with Eq. (16), it is evident that the r.m.s. uncertainty of the wave 
number & in this packet is indeed equal to dk defined by Eq. (24), thus justifying the notation. The 
comparison of the last relation with Eq. (1.35) shows that the Gaussian packet represents the ultimate 
case in which the product dxdp = dx(hok) has the lowest possible value (f/2); for any other envelope’s 
shape, the uncertainty product may only be larger. We could of course get the same result for ok from 
Eq. (16) using the definitions (1.23), (1.33), and (1.34); the real advantage of Eq. (25) is that it can be 
readily generalized to t > 0. Indeed, we already know that the time evolution of the wavefunction is 
always given by Eq. (1.69), for our current case® 


Gaussian 
wave 
packet 


l 1/2 l (k ~ky?? i he 
sat — ie (27)"" (dk )""” aa (25k)° oa on Om \e 


time 


(2.26) 


Fig. 1 shows several snapshots of the real part of the wavefunction (26), for a particular case ok = 0.1 ko. 


Fig. 2.1. Typical time evolution 
of a 1D wave packet on (a) 
smaller and (b) larger time scales. 
The dashed lines show the packet 
envelopes, i.e. + ae 
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ik 
| t= gage . 
Re P | h alll Yo eaapad ti MAMUNAATUMEANNANAA Ate Qpag 
0 W a TTT VyuuuV 
ik 
-— 10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 
x/ 0x 


6 Note that Eq. (26) differs from Eq. (16) only by an exponent of a purely imaginary number, and hence this 
wavefunction is also properly normalized to 1 — see Eq. (3). Hence the wave packet introduction offers a natural 
solution to the problem of traveling de Broglie wave’s normalization, which was mentioned in Sec. 1.2. 
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The plots clearly show the following effects: 


(i) the wave packet as a whole (as characterized by its envelope) moves along the x-axis with a 
certain group velocity Ver, 


(ii) the “carrier” quasi-sinusoidal wave inside the packet moves with a different, phase velocity 
Vph, Which may be defined as the velocity of the spatial points where the wave’s phase Q(x, t) = arg'¥ 
takes a certain fixed value (say, = 7/2, where Re'¥ vanishes), and 


(111) the wave packet’s spatial width gradually increases with time — the packet spreads. 


All these effects are common for waves of any physical nature.’ Indeed, let us consider a 1D 
wave packet of the type (26), but more general: 
. Arbitrary 
= i(kx—at) 
(x,t) = } a,e dk, (227). ee 
propagating in a medium with an arbitrary (but smooth!) dispersion relation @(k), and assume that the 
wave number distribution a, is narrow: ok << (k) = ko — see Fig. 1.6b. Then we may expand the function 
ak) into the Taylor series near the central wave number ko, and keep only three of its leading terms: 


k?, where k =k-k,, @ =alk,), (2.28) 


where both derivatives have to be evaluated at the point k = ko. In this approximation,® the expression in 
the parentheses on the right-hand side of Eq. (27) may be rewritten as 


2 
kx — alk)t © kox + kx [on piers 


dk 2 dk? 
(2.29) 
la 
= (kx — aot )+i(x-2r)- dk 
so that Eq. (27) becomes 
(x,t) eho) fa, v0 c [+4 2. ! 1a eR ke (2.30) 


First, let neglect the last term in square brackets (which is much smaller than the first term if the 
dispersion relation is smooth enough and/or the time interval ¢ is sufficiently small), and compare the 
result with the initial form of the wave packet (27): 


P(x,0) = fa, el dk = A(x)e’ oe. with A(x) = [aye"*dk . (2.31) 
The comparison shows that in this approximation, Eq. (30) is reduced to 
ae 
Y(x,t)= A(e—v,,te 0 Yn) (2.32) 


where Vg; and Vp, are two constants with the dimension of velocity: 


7 See, e.g., brief discussions in CM Sec. 6.3 and EM Sec. 7.2. 
8 By the way, in the particular case of de Broglie waves described by the dispersion relation (1.30), Eq. (28) is 
exact, because w= E/h is a quadratic function of k = p/h, and all higher derivatives of w over k vanish for any ko. 
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(2.33a) 


Clearly, Eq. (32) describes the effects (i) and (ii) listed above. For the particular case of the de Broglie 
waves, whose dispersion law is given by Eq. (1.30), 


dao hk 1) 
eee Melee (2.33b) 
We see that (very fortunately :-) the velocity of the wave packet’s envelope is equal to vo — the classical 
velocity of the same particle. 


Next, the last term in the square brackets of Eq. (30) describes the effect (111), the wave packet’s 
spread. It may be readily evaluated if the packet (27) is initially Gaussian, as in our example (25): 


72 
a, =const x 09f- aay : (2.34) 


In this case the integral (30) is Gaussian, and may be worked out exactly as the integral (21), 1.e. by 
representing the merged exponents under the integral as a full square of a linear combination of x and k: 


7 2 - 32 
- _s ke Vort) ig Okt 
(26k) 2 dk 
~ x-Vit ; (x-v,,t} id’o ey 
=—A(t)| k +i —— = + ikyx -—-— ket, 
2A(t) AA(t) 2 dk 
where I have introduced the following complex function of time: 
1 id’a , ida 
A(t) = + = (0x) += t, 2.36 
©) A(ok)> 2 dk? - 2 dk? ca: 
and used Eq. (24). Now integrating over k , we get 
(x-v,,t)? ld’a 
W447) cc exps = -—___= — 49 § ¢§ —- — —_ 71+. 2.37 
(x,t) i AAC) 0% Fe (2.37) 


The imaginary part of the ratio 1/A(d) in this exponent gives just an additional contribution to the wave’s 
phase and does not affect the resulting probability distribution 


_ aw" (x- Vat)” | 
w(x,t) =P VY x exp 5 Re AO! (2.38) 


This is again a Gaussian distribution over axis x, centered to point (x) = Vert, with the r.m.s. width 


vo ok: _ 2 ,({ld’a | 
(cx') =jrd (dx) (3 re j gr (2.39a) 


In the particular case of de Broglie waves, d’a/dk = h/m, so that 


Chapter 2 Page 7 of 76 


Essential Graduate Physics QM: Quantum Mechanics 


2 2 ht . 1 Wave 
(dx")” = (&e)’ +] — =: (2.39b) packet's 
2m) (ox) spread 


The physics of the packet spreading is very simple: if d’w/dk # 0, the group velocity da/dk of 
each small group dk of the monochromatic components of the wave is different, resulting in the gradual 
(eventually, linear) accumulation of the differences of the distances traveled by the groups. The most 
curious feature of Eq. (39) is that the packet width at t > 0 depends on its initial width dx ’(0) = dx ina 
non-monotonic way, tending to infinity at both dx— 0 and dx — o. Because of that, for a given time 
interval ¢, there is an optimal value of dx that minimizes dx’: 


(i) nin = V2 (St) got = (*) . (2.40) 


m 


This expression may be used for estimates of the spreading effect. Due to the smallness of the Planck 
constant i on the human scale of things, for macroscopic bodies this effect is extremely small even for 
very long time intervals; however, for light particles it may be very noticeable: for an electron (m = me, * 
10°° kg), and t = 1 s, Eq. (40) yields (Sx min ~ 1 em. 


Note also that for any ¢ # 0, the wave packet retains its Gaussian envelope, but the ultimate 
relation (24) is not satisfied, dx’dp > h/2 — due to a gradually accumulated phase shift between the 
component monochromatic waves. The last remark on this topic: in quantum mechanics, the wave 
packet spreading is not a ubiquitous effect! For example, in Chapter 5 we will see that in a quantum 
oscillator, the spatial width of a Gaussian packet (for that system, called the Glauber state of the 
oscillator) does not grow monotonically but rather either stays constant or oscillates in time. 


Now let us briefly discuss the case when the initial wave packet is not Gaussian but is described 
by an arbitrary initial wavefunction. To make the forthcoming result more aesthetically pleasing, it is 
beneficial to generalize our calculations to an arbitrary initial time fo; it is evident that if U does not 
depend on time explicitly, it is sufficient to replace ¢ with (¢ — fo) in all above formulas. With this 
replacement, Eq. (27) becomes 


Y(x,t) = (aan - =) ag (2.41) 
and the reciprocal transform (21) reads 
ae [Monte de. (2.42) 
20 


If we want to express these two formulas with one relation, i.e. plug Eq. (42) into Eq. (41), we should 
give the integration variable x some other name, e.g., xo. (Such notation is appropriate because this 
variable describes the coordinate argument in the initial wave packet.) The result is 


(x,t) = =~ [ak dx. ¥ Ox, teltb-abal-t I (2.43) 


Changing the order of integration, this expression may be rewritten in the following general form: 
1D 
Wt) = | Gz, 6%.t) Fost) aos (2.44) propagator: 


definition 
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where the function G, usually called kerne/ in mathematics, in quantum mechanics is called the 
propagator.? Its physical sense may be understood by considering the following special initial 
condition:!0 

Y(x,.t)) = 0(% -*), (2.45) 


where x’ is a certain point within the domain of particle’s motion. In this particular case, Eq. (44) gives 
P(x,t) = Gx, 65 x',t,). (2.46) 


Hence, the propagator, considered as a function of its arguments x and f¢ only, is just the wavefunction of 
the particle, at the 6-functional initial conditions (45). Thus just as Eq. (41) may be understood as a 
mathematical expression of the linear superposition principle in the momentum (i.e., reciprocal) space 
domain, Eq. (44) is an expression of this principle in the direct space domain: the system’s “response” 
‘Y(x,4) to an arbitrary initial condition ‘¥(xo,fo) is just a sum of its responses to its elementary spatial 
“slices” of this initial function, with the propagator G(x,t; xo,fo) representing the weight of each slice in 
the final sum. 


According to Eqs. (43) and (44), in the particular case of a free particle the propagator is equal to 
1 ¢ ilk(x—x, kal(t- 
Gite) = a a) tollan (2.47) 
20 


Calculating this integral, one should remember that here @ is not a constant but a function of k, given by 
the dispersion relation for the partial waves. In particular, for the de Broglie waves, with ha=h’k’/2m, 


Gt sf) | eof a) ae |e (2.48) 
ua m 


This is a Gaussian integral again, and may be readily calculated just it was done (twice) above, by 
completing the exponent to the full square. The result is 


(2.49) 


7 men | 


1/2 
m 
G(x,t;x,,t,) =| ——_ ex —_——— 
ib %orly) (—] P| 2ih(t —t,) 


Please note the following features of this complex function (plotted in Fig. 2): 


(i) It depends only on the differences (x — xo) and (t — fo). This is natural because the free-particle 
propagation problem is translation-invariant both in space and time. 


(ii) The function’s shape does not depend on its arguments — they just rescale the same function: 
its snapshot (Fig. 2), if plotted as a function of un-normalized x, just becomes broader and lower with 
time. It is curious that the spatial broadening scales as (¢ — fo)'” — just as at the classical diffusion, as a 
result of a deep mathematical analogy between quantum mechanics and classical statistics — to be 
discussed further in Chapter 7. 


9 Its standard notation by letter G stems from the fact that the propagator is essentially the spatial-temporal 
Green’s function, defined very similarly to Green’s functions of other ordinary and partial differential equations 
describing various physics systems — see, e.g., CM Sec. 5.1 and/or EM Sec. 2.7 and 7.3. 

10 Note that such initial condition is mathematically not equivalent to a 6-functional initial probability density (3). 
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(111) In accordance with the uncertainty relation, the ultimately compressed wave packet (45) has 
an infinite width of momentum distribution, and the quasi-sinusoidal tails of the free-particle 
propagator, clearly visible in Fig. 2, are the results of the free propagation of the fastest (highest- 
momentum) components of that distribution, in both directions from the packet center. 


0.5 


(n| C(t Xt) 
| 


Im] [m/n@—t,)}"? | Hh AR 


parts of the 1D free 
particle’s propagator (49). 


SF EVENT ETE) ig, 2.2. The real (solid line) 
ale PE tS EV EVEVET: and imaginary (dotted line) 
0 10 


(x—x,)/[h(t—t,)/m] 


In the following sections, I will mostly focus on monochromatic wavefunctions (that, for 
unconfined motion, may be interpreted as wave packets of a very large spatial width ox), and only rarely 
discuss wave packets. My best excuse is the linear superposition principle, i.e. our conceptual ability to 
restore the general solution from that of monochromatic waves of all possible energies. However, the 
reader should not forget that, as the above discussion has illustrated, mathematically such restoration is 
not always trivial. 


2.3. Particle reflection and tunneling 


Now, let us proceed to the cases when a 1D particle moves in various potential profiles U(x) that 
are constant in time. Conceptually, the simplest of such profiles is a potential step — see Fig. 3. 


classically accessible 


classically forbidden 


Fig. 2.3. Classical 1D motion in a potential 
x, profile U(x). 
classical turning point 


As I am sure the reader knows, in classical mechanics the particle’s kinetic energy p’/2m cannot 
be negative, so if the particle is incident on such a step (in Fig. 3, from the left), it can only travel 
through the classically accessible region, where its (conserved) full energy, 

2 


E=-2 +U(n), (2.50) 
2m 

is larger than the local value U(x). Let the initial velocity v = p/m be positive, i.e. directed toward the 

step. Before it has reached the classical turning point x,, defined by equality 
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U(x,)=E, (2.51) 


the particle’s kinetic energy p’/2m is positive, so that it continues to move in the initial direction. On the 
other hand, a classical particle cannot penetrate that classically forbidden region x > x,, because there 
its kinetic energy would be negative. Hence when the particle reaches the point x = x,, its velocity has to 
change its sign, i.e. the particle is reflected back from the classical turning point. 


In order to see what does the wave mechanics say about this situation, let us start from the 
simplest, sharp potential step shown with the bold black line in Fig. 4: 
0, atx<0, 


U(x) =U,0(x) = f oe (2.52) 


For this choice, and any energy within the interval 0 < E < Up, the classical turning point is x, = 0. 


U(x), E 


Fig. 2.4. The reflection of a 
monochromatic wave from a potential 
step U, > E. (This particular 
wavefunction’s shape is for Up = 5E.) 
The wavefunction is plotted with the 
same schematic vertical offset by E as 
those in Fig. 1.8. 


Let us represent an incident particle with a wave packet so long that the spread ok ~ 1/dx of its 
wave-number spectrum is sufficiently small to make the energy uncertainty 6E = héw = h(daldk) 6k 
negligible in comparison with its average value E < Uo, as well as with (Up — £). In this case, EF may be 
considered as a given constant, the time dependence of the wavefunction is given by Eq. (1.62), and we 
can calculate its spatial factor y(x) from the 1D version of the stationary Schrédinger equation (1.65):!! 


dy 
2m dx? 


At x < 0, i.e. at U = 0, the equation is reduced to the Helmholtz equation (1.78), and may be 
satisfied with either of two traveling waves, proportional to exp{+ikx} and exp {-ikx} correspondingly, 
with k satisfying the dispersion equation (1.30): 


+U(xy = Ey. (2.53) 


2mE 
—— ee (2.54) 


Thus the general solution of Eq. (53) in this region may be represented as 


w_(x)= Aet™ + Be | (2.55) 


'1 Note that this is not the eigenproblem like the one we have solved in Sec. 1.4 for a potential well. Indeed, now 
the energy F is considered given — e.g., by the initial conditions that launch a long wave packet upon the potential 
step — in Fig. 4, from the left. 
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The second term on the right-hand side of Eq. (55) evidently describes a (formally, infinitely long) wave 
packet traveling to the left, arising because of the particle’s reflection from the potential step. If B = —A, 
this solution is reduced to Eq. (1.84) for the potential well with infinitely high walls, but for our current 
case of a finite step height Up, the relation between the coefficients B and A may be different. 


To show this, let us solve Eq. (53) for x > 0, where U = Up > E. In this region the equation may 

be rewritten as 
d°’y 
dx? 


where x is a real and positive constant defined by a formula similar in structure to Eq. (54): 


Im(U, —E 
Pa ea) >0. (2.57) 
r 


The general solution of Eq. (56) is the sum of exp{+ax} and exp{—«x}, with arbitrary coefficients. 
However, in our particular case the wavefunction should be finite at x — +00, so only the latter exponent 
is acceptable: 


+ 


=K’y,, (2.56) 


wy (x)=Ce™, (2.58) 


Such penetration of the wavefunction to the classically forbidden region, and hence a non-zero 
probability to find the particle there, is one of the most fascinating predictions of quantum mechanics, 
and has been repeatedly observed in experiment — e.g., via tunneling experiments — see the next 
section.!* From Eq. (58), it is evident that the constant «, defined by Eqs. (57), may be interpreted as the 
reciprocal penetration depth. Even for the lightest particles, this depth is usually very small. Indeed, for 
E << Uo that relation yields 

1 


| 2 h 
eae | 


— (2.59) 
2mU, = 


For example, let us consider a conduction electron in a typical metal, which runs, at the metal’s surface, 
into a sharp potential step whose height is equal to metal’s workfunction Up ~ 5 eV — see the discussion 
of the photoelectric effect in Sec. 1.1. In this case, according to Eq. (59), 6 is close to 0.1 nm, i.e. is 
close to a typical size of an atom. For heavier elementary particles (e.g., protons) the penetration depth 
is correspondingly lower, and for macroscopic bodies, it is hardly measurable. 


Returning to Eqs. (55) and (58), we still should relate the coefficients B and C to the amplitude A 
of the incident wave, using the boundary conditions at x = 0. Since EF is a finite constant, and U(x) is a 
finite function, Eq. (53) says that a ywidx’ should be finite as well. This means that the first derivative 
should be continuous: 


+€ 72 


d d 2m.,. “ 
Ga ee 2 =lim,. 9 | at = = lim 59 [lUG)- Ely dx = 0.2.60) 


Repeating such calculation for the wavefunction y(x) itself, we see that it also should be continuous at 
all points, including the border point x = 0, so that the boundary conditions in our problem are 


12 Note that this effect is pertinent to waves of any type, including mechanical waves (see, e.g., CM Secs. 6.4 and 
7.7) and electromagnetic waves (see, e.g., EM Secs. 7.3-7.7). 
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v_)=¥,(0), ~ (y-“«. (2.61) 
he dx 


Plugging Eqs. (55) and (58) into Eqs. (61), we get a system of two linear equations 


A+B=C, ikA —ikB = —kC, (2.62) 
whose (easy :-) solution allows us to express B and C via A : 
peg ou ye. (2.63) 
k+ik k+ik 


We immediately see that the numerator and denominator in the first of these fractions have equal 
moduli, so that |B] = |A|. This means that, as we could expect, a particle with energy E < Up is totally 
reflected from the step — just as in classical mechanics. As a result, at x < 0 our solution (55) may be 
represented as a standing wave 


y_ =2iAe!? sin(kx-6), with 6 = tan” i (2.64) 
K 
Note that the shift Ax = @k = (tan’'k/«)/k of the standing wave to the right, due to the partial penetration 
of the wavefunction under the potential step, is commensurate with, but generally not equal to the 
penetration depth 6= 1/x. The red line in Fig. 4 shows the exact behavior of the wavefunction, for a 
particular case E = Up/5, at which k/k = [E/(Up-E)]!7= 1/2. 


According to Eq. (59), as the particle’s energy EF is increased to approach Uo, the penetration 
depth 1/x« diverges. This raises an important issue: what happens at E > Up, i.e. if there is no classically 
forbidden region in the problem? In classical mechanics, the incident particle would continue to move to 
the right, though with a reduced velocity, corresponding to the new kinetic energy E — Uo, so there 
would be no reflection. In quantum mechanics, however, the situation is different. To analyze it, it is not 
necessary to re-solve the whole problem; it is sufficient to note that all our calculations, and hence Eqs. 
(63) are still valid if we take! 


2m(E —- 
k=-ik’, with pr? = 2B —U) (2.65) 
h 
With this replacement, Eq. (63) becomes!* 
ee cae eage (2.66) 
k+k' k+k' 


The most important result of this change is that now the particle’s reflection is not total: | B| < 
| A |. To evaluate this effect quantitatively, it is fairer to use not the B/A or C/A ratios, but rather that of 
the probability currents (5) carried by the de Broglie waves traveling to the right, with amplitudes C and 
A, in the corresponding regions (respectively, for x > 0 and x < 0): 


'3 Our earlier discarding of the particular solution exp {xx}, now becoming exp {-ik’x}, is still valid, but now on 
different grounds: this term would describe a wave packet incident on the potential step from the right, and this is 
not the problem under our current consideration. 

14 These formulas are completely similar to those describing the partial reflection of classical waves from a sharp 
interface between two uniform media, at normal incidence (see, e.g., CM Sec. 6.4 and EM Sec. 7.4), with the 
effective impedance Z of de Broglie waves being proportional to their wave number k. 
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KCl ' _ 1/2 
Io KIC) 4kk' _—AE(E-U)] 0.67) 


ac — = = ; 
I, kal +k) [E24 (e-u,)?P 


(The parameter 7 so defined is called the transparency of the system, in our current case of the potential 
step of height Uo, at particle’s energy FE.) The result given by Eq. (67) is plotted in Fig. 5a as a function 
of the U)/E ratio. Note its most important features: 


(i) At Up = 0, the transparency is full, 7= 1 — naturally, because there is no step at all. 


(11) At Up > E, the transparency drops to zero, giving a proper connection to the case E < Up. 


(111) Nothing in our solution’s procedure prevents us from using Eq. (67) even for Up < 0, i.e. for 
the step-down (or “cliff’) potential profile — see Fig. 5b. Very counter-intuitively, the particle is (partly) 
reflected even from such a cliff, and the transmission diminishes (though rather slowly) at Up > —0o. 


(a) , ae (b) 
1 —____——__»> 1 —__—__ > 
E>0 
Mm — 1 
0.8 4 B#0 
U=0 
0.6 | 
SF 
0.4 Uy 
2 Fig. 2.5. (a) The transparency of a potential step with Uo 
< Fas a function of its height, according to Eq. (75), and 
on . (b) the “cliff” potential profile, with Up < 0. 
U,/E 


The most important conceptual conclusion of this analysis is that the quantum particle is partly 
reflected from a potential step with Up < E, in the sense that there is a non-zero probability 7< 1 to find 
it passed over the step, while there is also some probability, (1 — Y) > 0, to have it reflected. 


The last property is exhibited, but for any relation between EF and Uo, by another simple potential 
profile U(x), the famous potential (or “tunnel’’) barrier. Fig. 6 shows its simple, “rectangular” version: 


0, forx<-—d/2, 
U(x)=4U,, for-—d/2<x<+d/2, (2.68) 
0, fort+d/2<x. 


To analyze this problem, it is sufficient to look for the solution to the Schrédinger equation in the form 
(55) at x <-d/2. At x > +d/2, i.e., behind the barrier, we may use the arguments presented above (no 
wave source on the right!) to keep just one traveling wave, now with the same wave number: 


w(x) = Fel, (2.69) 
However, under the barrier, i.e. at —d/2 < x < +d/2, we should generally keep both exponential terms, 


w(x) =Ce ™ + De™, (2.70) 
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because our previous argument, used in the potential step problem’s solution, is no longer valid. (Here k 
and « are still defined, respectively, by Eqs. (54) and (57).) In order to express the coefficients B, C, D, 
and F via the amplitude A of the incident wave, we need to plug these solutions into the boundary 
conditions similar to Eqs. (61), but now at two boundary points, x = + d/2. 


<?—_ 
B D 
Fig. 2.6. A rectangular potential 
U =0 barrier, and the de Broglie waves 


taken into account in its analysis. 
—d/2 +d/2 x 


Solving the resulting system of 4 linear equations, we get four ratios B/A, C/A, etc.; in particular, 


. =i 
“ = cost ‘id {4 “) si ] ewikd (2.71a) 


kK 


and hence the barrier’s transparency 


Rectangular 
tunnel 
barrier’s 
transparency 


(2.71b) 


So, quantum mechanics indeed allows particles with energies E < Up to pass “through” the 
potential barrier — see Fig. 6 again. This is the famous effect of guantum-mechanical tunneling. Fig. 7a 
shows the barrier transparency as a function of the particle energy EF, for several characteristic values of 
its thickness d, or rather of the ratio d/6, with 6 defined by Eq. (59). 


(b) 


0 0.2 0.4 0.6 0.8 
EIU, (i270)? 


Fig. 2.7. The transparency of a rectangular potential barrier as a function of the particle’s energy E. 
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The plots show that generally, the transparency grows gradually with the particle’s energy. This 
growth is natural because the penetration constant « decreases with the growth of E, 1.e., the 
wavefunction penetrates more and more into the barrier, so that more and more of it is “picked up” at 
the second interface (x = +d/2) and transferred into the wave Fexp {ikx} propagating behind the barrier. 


Now let us consider the important limit of a very thin and high rectangular barrier, d << 6, E << 
Up, giving k << «<< 1/d. In this limit, Eq. (71) yields 


2 


1 7 1 
Ji+ia|” l+a’ 


kd *¥ —-——-x—_U.d, (2.72 
2k we a 


F= = . where @ at 
A 2 


The last product, Uod, is just the “energy area” (or the “weight’) 
W = | U(x)dx (2.73) 
U(x)>E 


of the barrier. This fact implies that the very simple result (72) may be correct for a barrier of any shape, 
provided that it is sufficiently thin and high. 


To confirm this guess, let us consider the tunneling problem for a very thin barrier with «d, kd 
<< 1, approximating it with the Dirac’s 6-function (Fig. 8): 


U(x) = W(x), (2.74) 


so that the parameter W satisfies Eq. (73). 


U(x) = W6(x) 


Fig. 2.8. A delta-functional potential 
x barrier. 


The solutions of the tunneling problem at all points but x = 0 still may be taken in the form of 
Eqs. (55) and (69), so we only need to analyze the boundary conditions at that point. However, due to 
the special character of the 6-function, we should be careful here. Indeed, instead of Eq. (60) we now get 


dy dy Vd-y ; 2m" 
im, rate le] = lim,,_.9 [ora - lim, 9 yl flu@ = Ely dx 
dx dx 2 ax a (2.75) 
2m 


According to this relation, at a finite W, the derivatives dy/dx are also finite, so that the wavefunction 
itself is still continuous: 


; ed 
lim, ,9 (oe -W| ate )= lim ,._,9 [onde = 0. (2.76) 
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Using these two boundary conditions, we readily get the following system of two linear equations, 


WwW 
A+B=F, ikF —(ikA—-ikB) = = F, (2.77) 
h 
whose solution yields 
B -i F 1 WwW 
—= = , —=— where @ = = ; (2.78) 
A l+ia A Il+t+ia h°k 


(Taking Eq. (73) into account, this definition of @ coincides with that in Eq. (72).) For the barrier 
transparency 7 = |F/A|’, this result again gives the first of Eqs. (72), which is therefore general for such 


thin barriers. That formula may be recast to give the following simple expression (valid only for E << 
Ural! 


(2.79) 


which shows that as energy becomes larger than the constant £o, the transparency approaches 1. 


Now proceeding to another important limit of thick barriers (d >> 6), Eq. (71) shows that in this 
case, the transparency is dominated by what is called the tunnel exponent: 


Ake \ _ 
7 -() g 7m (2.80) 


— the behavior which may be clearly seen as the straight-line segments in semi-log plots (Fig. 7b) of 7 


as a function of the combination (1 — E/U)'” , which is proportional to « — see Eq. (57). This 
exponential dependence on the barrier thickness is the most important factor for various applications of 
quantum-mechanical tunneling — from the field emission of electrons to vacuum!> to the scanning 
tunneling microscopy.!® Note also very substantial negative implications of the effect for the electronic 
technology progress, most importantly imposing limits on the so-called Dennard scaling of field-effect 
transistors in semiconductor integrated circuits (which is the technological basis of the well-known 
Moore’s law), due to the increase of tunneling both through the gate oxide and along the channel of the 
transistors, from source to drain.!7 


Finally, one more feature visible in Fig. 7a (for case d = 36) are the oscillations of the 
transparency as a function of energy, at E > Uo, with Y = 1, 1.e. the reflection completely vanishing, at 
some points.!8§ This is our first glimpse at one more interesting quantum effect: resonant tunneling. This 
effect will be discussed in more detail in Sec. 5 below, using another potential profile where it is more 
clearly pronounced. 


'5 See, e.g., G. Fursey, Field Emission in Vacuum Microelectronics, Kluwer, New York, 2005. 

16 See, e.g., G. Binning and H. Rohrer, Helv. Phys. Acta 55, 726 (1982). 

'7 See, e.g., V. Sverdlov et al., IEEE Trans. on Electron Devices 50, 1926 (2003), and references therein. (A brief 
discussion of the field-effect transistors, and literature for further reading, may be found in SM Sec. 6.4.) 

18 Let me mention in passing the curious case of the potential well U(x) = -(h’/2m)(v + 1)/cosh?(x/a), with any 
positive integer v and any real a, which is reflection-free (7= 1) for the incident de Broigle wave of any energy 
E, and hence for any incident wave packet. Unfortunately, a proof of this fact would require more time/space than 
I can afford. (Note that it was first described in a 1930 paper by Paul Sophus Epstein, before the 1933 publication 
by G. Péschl and E. Teller, which is responsible for the common name of this Péschi-Teller potential.) 
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2.4. Motion in soft potentials 


Before moving on to exploring other quantum-mechanical effects, let us see how the results 
discussed in the previous section are modified in the opposite limit of the so-called soft (also called 
“smooth” potential profiles, like the one sketched in Fig. 3.!9 The most efficient analytical tool to study 
this limit is the so-called WKB (or “JWKB”, or “quasiclassical”) approximation developed by H. 
Jeffrey, G. Wentzel, A. Kramers, and L. Brillouin in 1925-27. In order to derive its 1D version, let us 
rewrite the Schrédinger equation (53) in a simpler form 


—+k*(x)y =0, (2.81) 
dx 
where the local wave number k(x) is defined similarly to Eq. (65), 
= Local 
pee U@)| (2.82) wave 
h number 


besides that now it may be a function of x. We already know that for A(x) = const, the fundamental 
solutions of this equation are Aexp{+ikx} and Bexp {-ikx}, which may be represented in a single form 
iD(x) 


y(x) =e (2.83) 


where (x) is a complex function, in these two simplest cases being equal, respectively, to (Ax — ilnA) 
and (-Ax — ilnB). This is why we may try use Eq. (83) to look for solution of Eq. (81) even in the general 
case, k(x) # const. Differentiating Eq. (83) twice, we get 


. 2 2 2 . 
aoe ee (=) el? (2.84) 
dx dx dx dx dx 
Plugging the last expression into Eq. (81) and requiring the factor before exp {iD(x)} to vanish, we get 
d’b (db) 
i —| —]| +k?(x)=0. 2189 
ee cs (x) (2.85) 


This is still an exact, general equation. At the first sight, it looks harder to solve than the initial 
equation (81), because Eq. (85) is nonlinear. However, it is ready for simplification in the limit when the 
potential profile is very soft, dU/dx — 0. Indeed, for a uniform potential, d°®/dx’ = 0. Hence, in the so- 
called 0” approximation, ®(x) > ®o(x), we may try to keep that result, so that Eq. (85) is reduced to 


(ee) =(x), ie, Penske, — O4(x)= tif Rand’ ae 
de dx 


so that its general solution is a linear superposition of two functions (83), with ® replaced with Do: 


Y (x)=A a if Kon +B 2 jar : (2.87) 


19 Quantitative conditions of the “softness” will be formulated later in this section. 


Chapter 2 Page 18 of 76 


Essential Graduate Physics QM: Quantum Mechanics 


where the choice of the lower limits of integration affects only the constants A and B. The physical sense 
of this result is simple: it is a sum of the forward- and back-propagating de Broglie waves, with the 
coordinate-dependent local wave number A(x) that self-adjusts to the potential profile. 


Let me emphasize the non-trivial nature of this approximation.?° First, any attempt to address the 
problem with the standard perturbation approach (say, y= y+ y +..., with y proportional to the n"™ 
power of some small parameter) would fail for most potentials, because as Eq. (86) shows, even a slight 
but persisting deviation of U(x) from a constant leads to a gradual accumulation of the phase Do, 
impossible to describe by any small perturbation of y. Second, the dropping of the term d’®/dx’ in Eq. 
(85) is not too easy to justify. Indeed, since we are committed to the “soft potential limit” dU/dx — 0, 
we should be ready to assume the characteristic length a of the spatial variation of ® to be large, and 
neglect the terms that are the smallest ones in the limit a + o. However, both first terms in Eq. (85) are 
apparently of the same order in a, namely O(a”); why have we neglected just one of them? 


The price we have paid for such a “sloppy” treatment is substantial: Eq. (87) does not satisfy the 
fundamental property of the Schrédinger equation solutions, the probability current’s conservation. 
Indeed, since Eq. (81) describes a fixed-energy (stationary) spatial part of the general Schrédinger 
equation, its probability density w = ¥Y* =wy*, and should not depend on time. Hence, according to 
Eq. (6), we should have /(x) = const. However, this is not true for any component of Eq. (87); for 
example for the first, forward-propagating component on its right-hand side, Eq. (5) yields 


I(x) ==] 4) (x), (2.88) 


evidently not a constant if k(x) # const. The brilliance of the WKB theory is that the problem may be 
fixed without a full revision of the 0" approximation, just by amending it. Indeed, let us explore the 
next, 1“ approximation: 

D(x) > ® yx, (x) = B(x) + ®, (x), (2.89) 


where ®y still obeys Eq. (86), while ®; describes a 0" approximation’s correction that is small in the 
following sense:2! 


dQ, d® , 
<< = k(x). 2.90 
| | raaleice (2.90) 
Plugging Eq. (89) into Eq. (85), with the account of the definition (86), we get 
2 2 
i g eo qe i clad pao + aaa =O (2.91) 
dx dx dx dx dx 


Using the condition (90), we may neglect d’@,/dx in comparison with d’®,/dx* inside the first 
parentheses, and d®,/dx in comparison with 2d®o/dx inside the second parentheses. As a result, we get 
the following (still approximate!) result: 


20 Philosophically, this space-domain method is very close to the time-domain van der Pol method in classical 
mechanics, and the very similar rotating wave approximation (RWA) in quantum mechanics — see, e.g., CM Secs. 
5.2-5.5, and also Secs. 6.5, 7.6, 9.2, and 9.4 of this course. 

2! For certainty, I will use the discretion given by Eq. (82) to define k(x) as the positive root of its right-hand side. 
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d®, el id(,d®,) id 12 
foe) ee oo eee wl © ae 
: 1 
i®| wep =D, +i@, = si f(x )dx' + ln ——— E(x)’ (2.93) 
b ee : WKB 
a= ae Bayt if kw ‘\dx' +TAG his —if k(x )dx'}. — for k?(x)>0. | (2.94) eee 


(Again, the lower integration limit is arbitrary, because its choice may be incorporated into the complex 
constants a and b.) This modified approximation overcomes the problem of current continuity; for 
example, for the forward-propagating wave, Eq. (5) gives 


h, 2 WKB 
I yxp(%) = —|a| = const. (2.95) probability 
m current 


Physically, the factor k'? in the denominator of the WKB wavefunction’s pre-exponent is easy to 
understand. The smaller the local group velocity (32) of the wave packet, vex) = hk(x)/m, the “easier” 
(more probable) it should be to find the particle within a certain interval dx. This is exactly the result 
that the WKB approximation gives: w(x) = yy* oc 1/k(x) 0 1/vg. Another value of the 1“ approximation 
is a clarification of the WKB theory’s validity condition: it is given by Eq. (90). Plugging into this 
relation the first form of Eq. (92), and estimating |d’@o/dx*| as |dDo/dx|/a, where a is the spatial scale of 
a substantial change of | dDo/dx | = k(x), we may write the condition as 7 


first 
ka>>1. (2.96) condition 
of validity 


In plain English, this means that the region where U(x), and hence A(x), change substantially should 
contain many de Broglie wavelengths 4 = 2z/k. 


So far I have implied that ‘°(x) « E — U(x) is positive, i.e. particle moves in the classically 
accessible region. Now let us extend the WKB approximation to situations where the difference E — 
U(x) may change sign, for example to the reflection problem sketched in Fig. 3. Just as we did for the 
sharp potential step, we first need to find the appropriate solution in the classically forbidden region, in 
this case for x > x. For that, there is again no need to redo our calculations, because they are still valid if 
we, just as in the sharp-step problem, take A(x) = ix(x), where 


x (o)= = 


and keep just one of two possible solutions (with «> 0), in analogy with Eq. (58). The result is 


for x >x,, (2.97) 


V wen (X) = aI ae [xcx va. fork? <0, ie.x? >0, (2.98) 


with the lower limit at some point with x° > 0 as well. This is a really wonderful formula! It describes 
the quantum-mechanical penetration of the particle into the classically forbidden region and provides a 
natural generalization of Eq. (58) — leaving intact our estimates of the depth 6~ 1/« of such penetration. 
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Now we have to do what we have done for the sharp-step problem in Sec. 2: use the boundary 
conditions at classical turning point x = x, to relate the constants a, b, and c. However, now this 
operation is a tad more complex, because both WKB functions (94) and (98) diverge, albeit weakly, at 
the point, because here both k(x) and x(x) tend to zero. This connection problem may be solved in the 
following way. 22 


Let us use our commitment of the potential’s “softness”, assuming that it allows us to keep just 
two leading terms in the Taylor expansion of the function U(x) at the point x<: 
dU 
dx 


Using this truncated expansion, and introducing the following dimensionless variable for the 
coordinate’s deviation from the classical turning point, 


U(x) =U(a) +S ox =x) E+ S|, Gx). (2.99) 
IX Cc Cc 


1/3 


_ 2 
CaX* with x, = (2.100) 
is 2m(dU /dx),_, 
we reduce the Schrédinger equation (81) to the so-called Airy equation 
2 
EV dypeay, (2.101) 


dg’ 
This simple linear, ordinary, homogenous differential equation of the second order has been very well 


studied. Its general solution may be represented as a linear combination of two fundamental solutions, 
the Airy functions Ai(¢ ) and Bi(¢), shown in Fig. 9a.” 


(b) 


Ail (6) 


S b 


: 10 0 10 —3 0 3 
S g 
Fig. 2.9. (a) The Airy functions Ai and Bi, and (b) the WKB approximation for the function Ai(¢). 


22 An alternative way to solve the connection problem, without involving the Airy functions but using an 
analytical extension of WKB formulas to the complex-argument plane, may be found, e.g., in Sec. 47 of the 
textbook by L. Landau and E. Lifshitz, Quantum Mechanics, Non-Relativistic Theory, 3rd ed. Pergamon, 1977. 

23 Note the following (exact) integral formulas, 


Ai(¢) = - | cos| + cee, Bi(g) = - | xo|- ~ + ce} + snl + ce). 


frequently more convenient for practical calculations of the Airy functions than the differential equation (101). 
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The latter function diverges at ¢ — +00, and thus is not suitable for our current problem (Fig. 3), 
while the former function has the following asymptotic behaviors at |¢'| >> 1: 


1 Z 3/2 
—exps — — ; for > +0, 
1 5 Pf 3 C CG 


n''*|6| sin| 2(- ey rah for ¢ > -00, 


(2.102) 


Now let us apply the WKB approximation to the Airy equation (101). Taking the classical 
turning point (¢ = 0) for the lower limit, for ¢ > 0 we get 


¢ 
ese, Moa, [aoa Se (2.103) 


i.e. exactly the exponent in the top line of Eq. (102). Making a similar calculation for ¢ < 0, with the 
natural assumption | 5 | =| a | (full reflection from the potential step), we arrive at the following result: 


1 crexp| -26""}, for ¢ > 0, 
(2.104) 


a'sin} 2-6)" oh for ¢ <0. 


This approximation differs from the exact solution at small values of ¢, i.e. close to the classical turning 
point — see Fig. 9b. However, at | ¢| >> 1, Eqs. (104) describe the Airy function exactly, provided that 


(2.105) 
These connection formulas may be used to rewrite Eq. (104) as 
exp 207, for ¢ >0, 
a 
Aiwxs(¢)=——77 x ‘ . - , 7 (2.106) 
26 | 1 enol 726°" 1h expt 1260" aril for ¢ <0, 
1 


and hence may be described by the following two simple mnemonic rules: 


(i) If the classical turning point is taken for the lower limit in the WKB integrals in the 
classically allowed and the classically forbidden regions, then the moduli of the quasi-amplitudes of the 
exponents are equal. 


(11) Reflecting from a “soft” potential step, the wavefunction acquires an additional phase shift 
Ag = 7/2, if compared with its reflection from a “hard”, infinitely high potential wall located at point x, 
(for which, according to Eq. (63) with «= 0, we have B = —A). 


In order for the connection formulas (105)-(106) to be valid, deviations from the linear 
approximation (99) of the potential profile should be relatively small within the region where the WKB 
approximation differs from the exact Airy function: | ¢| ~ 1, 1.e. |x —x¢| ~ xo. These deviations may be 
estimated using the next term of the Taylor expansion, dropped in Eq. (99): (d’U/d’x)(x — x,)’/2. As a 
result, the condition of validity of the connection formulas (i.e. of the “softness” of the reflecting 
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potential profile) may be expressed as | PUlekx |<< | dU/dx | at x = xe — meaning the ~xo—wide vicinity 
of the point x,). With the account of Eq. (100) for xo, this condition becomes 


4 
< a) (2.107) 
X 


2 
h XOX 
c 


As an example of a very useful application of the WKB approximation, let us use the connection 
formulas to calculate the energy spectrum of a 1D particle in a soft 1D potential well (Fig. 10). 


Fig. 2.10. The WKB treatment of an eigenstate 
of a particle in a soft 1D potential well. 


As was discussed in Sec. 1.7, we may consider the standing wave describing an eigenfunction y, 
(corresponding to an eigenenergy E,,) as a sum of two traveling de Broglie waves going back and forth 
between the walls, being sequentially reflected from each of them. Let us apply the WKB approximation 
to such traveling waves. First, according to Eq. (94), propagating from the left classical turning point x, 
to the right such point xp, it acquires the phase change 
XR 
Ag, = [k(x)dr. (2.108) 
x, 
At the reflection from the soft wall at xg, according to the mnemonic rule (ii), the wave acquires an 
additional shift 7/2. Now, traveling back from xg to x,, the wave gets a shift similar to one given by Eq. 
(108): Ag_ = Aq@_,. Finally, at the reflection from x, it gets one more 7/2-shift. Summing up all these 


contributions at the wave’s roundtrip, we may write the self-consistency condition (that the 
wavefunction “catches its own tail with its teeth”) in the form 


XR 
AP = AQ, es AQ. . =2[k(x)dx+2=2m, with n=1,2.... (2.109) 
x, 


Rewriting this result in terms of the particle’s momentum p(x) = fik(x), we arrive at the so-called Wilson- 
Sommerfeld (or “Bohr-Sommerfeld”’) quantization rule 


f pdx = 2am = 5) (2.110) 
Ss 2 
where the closed path C means the full period of classical motion.74 


24 Note that at the motion in more than one dimension, a closed classical trajectory may have no classical turning 
points. In this case, the constant 2, arising from the turns, should be dropped from Eqs. (110) written for the 
scalar product p(r)-dr — the so-called Bohr quantization rule. It was suggested by N. Bohr as early as 1913 as an 
interpretation of Eq. (1.8) for the circular motion of the electron around the proton, while its 1D modification 
(110) is due to W. Wilson (1915) and A. Sommerfeld (1916). 
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Let us see what does this quantization rule give for the very important particular case of a 
quadratic potential profile of a harmonic oscillator of frequency @p. In this case, 


U(x) =F a5x", (2.111) 
and the classical turning points (where U(x) = £) are the roots of a simple equation 
CE 1/2 
Max=E,,  sothat x, -2{ | >0, x, =-x, <0. (2.112) 
2 Q,\ m 


Due to the potential’s symmetry, the integration required by Eq. (110) is also simple: 


i p(x)dx = f {2m[E,, —U(x)]}""° dx =(2mE,)"*2 f p=) dx 


AL, ie (2.113) 
= (2mE, "2x, f(l-€?) "dg = (2mE, J"? 2x, = = aa 
0 4 
so that Eq. (110) yields 
E, = han| n+}, with n'=n-1=0,1,2,.... (2.114) 


To estimate the validity of this result, we have to check the condition (96) at all points of the 
classically allowed region, and Eq. (107) at the turning points. The checkup shows that both conditions 
are valid only for n >> 1. However, we will see in Sec. 9 below that Eq. (114) is actually exactly correct 
for all energy levels — thanks to special properties of the potential profile (111). 


Now let us use the mnemonic rule (i) to examine particle’s penetration into the classically 
forbidden region of an abrupt potential step of a height Up > E. For this case, the rule, i.e. the second of 
Eqs. (105), yields the following relation of the quasi-amplitudes in Eqs. (94) and (98): |c| = |a|/2. If we 
now naively applied this relation to the sharp step sketched in Fig. 4, forgetting that it does not satisfy 
Eq. (107), we would get the following relation of the full amplitudes, defined by Eqs. (55) and (58): 


(WRONG!) (2.115) 


This result differs from the correct Eq. (63), and hence we may expect that the WKB approximation’s 
prediction for more complex potentials, most importantly for tunneling through a soft potential barrier 
(Fig. 11) should be also different from the exact result (71) for the rectangular barrier shown in Fig. 6. 


Fig. 2.11. Tunneling through 
a soft 1D potential barrier. 
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In order to analyze tunneling through such a soft barrier, we need (just as in the case of a 
rectangular barrier) to take unto consideration five partial waves, but now they should be taken in the 
WKB form: 


ae ripe if k(x" dx |: mo se) Jaca forx<x,, 


Won = PLT “we frcerar oo sen fac forex, ~ xe xX, (2.116) 


ee j on k(x')dx' | for x,'< x, 

where the lower limits of integrals are arbitrary (each within the corresponding range of x). Since on the 
right of the left classical point, we have two exponents rather than one, and on the right of the second 
point, one traveling waves rather than two, the connection formulas (105) have to be generalized, using 
asymptotic formulas not only for Ai(¢ ), but also for the second Airy function, Bi(¢ ). The analysis, 
absolutely similar to that carried out above (though naturally a bit bulkier),2> gives a remarkably simple 


result: 
Soft 2 *e 24 
beleieel Foren = = exp,—2 i K(x)dx = exp as j (2m|[U(x) - E])'"*d (2.117) 
transparency a Xx, x, 


with the pre-exponential factor equal to 1 — the fact which might be readily expected from the mnemonic 
rule (1) of the connection formulas. 


This formula is broadly used in applied quantum mechanics, despite the approximate character 
of its pre-exponential coefficient for insufficiently soft barriers that do not satisfy Eq. (107). For 
example, Eq. (80) shows that for a rectangular barrier with thickness d >> 6, the WKB approximation 
(117) with dwxg=d underestimates 7 by a factor of [4kx/(k° + «°)]° — equal, for example, 4, if k= x, ice. 
if Up = 2E. However, on the appropriate logarithmic scale (see Fig. 7b), such a factor, smaller than an 
order of magnitude, is just a small correction. 


Note also that when E approaches the barrier’s top Umax (Fig. 11), the points x, and x,’ merge, so 
that according to Eq. (117), Ywxs — 1, ie. the particle reflection vanishes at FE = Umax. So, the WKB 
approximation does not describe the effect of the over-barrier reflection at FE > Umax. (This fact could be 
noticed already from Eq. (95): in the absence of the classical turning points, the WKB probability 
current is constant for any barrier profile.) This conclusion is incorrect even for apparently smooth 
barriers where one could naively expect the WKB approximation to work perfectly. Indeed, near the 
point x = xm where the potential reaches maximum (i.e. U(%m) = Umax), We may always approximate any 
smooth function U(x) with the quadratic term of the Taylor expansion, 1.e. with an inverted parabola: 


7 ma, (x—~x,,) 


U(x) 2U 
( ) max 2 


(2.118) 


25 For the most important case Awxp << 1, Eq. (117) may be simply derived from Eqs. (105)-(106) — the exercise 
left for the reader. 


Chapter 2 Page 25 of 76 


Essential Graduate Physics QM: Quantum Mechanics 


Calculating derivatives dU/dx and d’U/dx* of this function and plugging them into the condition 
(107), we may see that the WKB approximation is only valid if |Umax — E| >> ha@p. Just for the reader’s 
reference, an exact analysis of tunneling through the barrier (118) gives the following Kemble formula:*6 


; (2.119) 


)/hay}? 


~ 1+ exp{-2a(E-U 


max 


valid for any sign of the difference (E — Umax). This formula describes a gradual approach of to 1, i.e. 
a gradual reduction of reflection, at the particle energy’s increase, with Y= '4 at E = Umax. 


The last remark of this section: the WKB approximation opens a straight way toward an 
alternative formulation of quantum mechanics, based on the Feynman path integral. However, I will 
postpone its discussion until a more compact notation has been introduced in Chapter 4. 


2.5. Resonant tunneling, and metastable states 


Now let us move to other, conceptually different quantum effects, taking place in more elaborate 
potential profiles. Neither piecewise-constant nor smooth-potential models of U(x) are convenient for 
their quantitative description because they both require “stitching” partial de Broglie waves at each 
classical turning point, which may lead to cumbersome calculations. However, we may get a very good 
insight of the physics of quantum effects that may take place in such profiles, using their approximation 
by sets of Dirac’s delta functions. 


Additional help in studying such effects is provided by the notions of the scattering and transfer 
matrices, very useful for other cases as well. Consider an arbitrary but finite-length potential “bump” 
(formally called a scatterer), localized somewhere between points x; and x, on the flat potential 
background, say U = 0 (Fig. 12). 


Fig. 2.12. De Broglie wave amplitudes 
near a single 1D scatterer. 


x, 0 X> x 


From Sec. 2, we know that the general solutions of the stationary Schrédinger equation, with a 
certain energy EF, outside the interval [x, x2] are sets of two sinusoidal waves, traveling in the opposite 
directions. Let us represent them in the form 
ik(x-x, 

a) +B, 


—ik(x-x ;) 
i ? 


y,=Aje (2.120) 


26 This formula was derived (in a more general form, valid for an arbitrary soft potential barrier) by E. Kemble in 
1935. In some communities, it is known as the “Hill-Wheeler formula”, after D. Hill and J. Wheeler’s 1953 paper 
in that the Kemble formula was spelled out for the quadratic profile (118). Note that mathematically Eq. (119) is 
similar to the Fermi distribution in statistical physics, with an effective temperature T.¢ = h@/27kp. This 
coincidence has some curious implications for the Fermi particle tunneling statistics. 
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where the index / (for now) is equal to either 1 or 2, and (nk)’/2m = E. Note that each of the two wave 
pairs (129) has, in this notation, its own reference point x;, because this is very convenient for what 
follows. As we have already discussed, if the de Broigle wave/particle is incident from the left (1.e. Bz = 
0), the solution of the linear Schrédinger equation within the scatterer range (x; < x < x2) can provide 
only linear expressions for the transmitted (42) and reflected (B,) wave amplitudes via the incident wave 
amplitude A): 

A, =8,,4,, B, =8,4; (2.121) 


where Sj; and S>; are certain (generally, complex) coefficients. Alternatively, if a wave, with amplitude 
Bo, is incident on the scatterer from the right (i.e. if 4; = 0), it can induce a transmitted wave (B,) and a 
reflected wave (A2), with amplitudes 


B,=8,,B,, A, =S,B,, (2.122) 


where the coefficients Sy. and Sj2 are generally different from $1; and S2;. Now we can use the linear 
superposition principle to argue that if the waves A; and B> are simultaneously incident on the scatterer 
(say, because the wave B> has been partly reflected back by some other scatterer located at x > x2), the 
resulting scattered wave amplitudes Az and B; are just the sums of their values for separate incident 
waves: 

B, =S,,4, + 5),B,, 


(2.123) 
A, = S,,4, + Sy,By. 


These linear relations may be conveniently represented using the so-called scattering matrix S: 


B, _ A, . = Si, Si 
=§ : with S= ; 
A, B, So Sy 


(2.124) 


Scattering matrices, duly generalized, are an important tool for the analysis of wave scattering in more 
dimensions than one; for 1D problems, however, another matrix is often more convenient to represent 
the same linear relations (123). Indeed, let us solve this system for A> and B>. The result is 


A, =1,4,+T,,B,, 
(2.125) 
B, =1,,A4,+T,B,, 
where T is the transfer matrix, with the following elements: 
SS S S 1 
T, =S_-—, T=, ey =-—, Dy, ==: (2.126) 
Si Si Sy Si 


The matrices S and T have some universal properties, valid for an arbitrary (but time- 
independent) scatterer; they may be readily found from the probability current conservation and the 
time-reversal symmetry of the Schrédinger equation. Let me leave finding these relations for the 
reader’s exercise. The results show, in particular, that the scattering matrix may be rewritten in the 
following form: 


t —re- 


; 1p 
s-e4(" ‘ ah (2.127a) 


where four real parameters 7, ¢, 8, and @ satisfy the following universal relation: 
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par sl, (2.127b) 


so that only 3 of these parameters are independent. As a result of this symmetry, 7|; may be also 
represented in a simpler form, similar to Tx: 7), = exp {i6}/t = 1/S\2 = 1/Sy, . The last form allows a 
ready expression of the scatterer’s transparency via just one coefficient of the transfer matrix: 

2 

A 2 ~2 

3) =|Sy/ =|Zu[-- (2.128) 
Ai|z 0 


S= 


In our current context, the most important property of the 1D transfer matrices is that to find the 
total transfer matrix T of a system consisting of several (say, NV) sequential arbitrary scatterers (Fig. 13), 
it is sufficient to multiply their matrices. 


Fig. 2.13. A sequence of several 1D 
scatterers. 


X v4 


Indeed, extending the definition (125) to other points x; (7 = 1, 2, ..., N+ 1), we can write 


A, A, A, A, A, 
=T,| "| =T,|-?|=T,T, ||, ete. (2.129) 
B, B, B, B, B, 


(where the matrix indices correspond to the scatterers’ order on the x-axis), so that 


Avot | eres el (2.130) 
By a B, 
But we can also define the total transfer matrix similarly to Eq. (125), i.e. as 
A A 
"Jarl | (2.131) 
By B, 


so that comparing Eqs. (130) and (131) we get 
(2.132) 


This formula is valid even if the flat-potential gaps between component scatterers are shrunk to 
zero, so that it may be applied to a scatterer with an arbitrary profile U(x), by fragmenting its length into 
many small segments Ax = xj; - x;, and treating each fragment as a rectangular barrier of the average 
height (UjJer = [U(xj+1) — U(xj)]/2 — see Fig. 14. Since very efficient numerical algorithms are readily 
available for fast multiplication of matrices (especially as small as 22 in our case), this approach is 
broadly used in practice for the computation of transparency of potential barriers with complicated 
profiles U(x). (Computationally, this procedure is much more efficient than the direct numerical solution 
of the stationary Schrédinger equation.) 
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Fig. 2.14. The transfer matrix approach 
to a potential barrier with an arbitrary 
profile. 


GX XN 


In order to apply this approach to several particular, conceptually important systems, let us 
calculate the transfer matrices for a few elementary scatterers, starting from the delta-functional barrier 
located at x = 0 — see Fig. 8. Taking x), x2 — 0, we can merely change the notation of the wave 
amplitudes in Eq. (78) to get 


—ia 1 
= ; = 2.133 
" l+ia a baa ( ) 
An absolutely similar analysis of the wave incidence from the left yields 
—ia 1 
Sy = > 2 = ease (2.134) 
l+ia l+ia 
and using Eqs. (126), we get 
(2.135) 


As a sanity check, Eq. (128) applied to this result, immediately brings us back to Eq. (79). 


The next example may seem strange at the first glance: what if there is no scatterer at all between 
the points x; and x2? If the points coincide, the answer is indeed trivial and can be obtained, e.g., from 
Eq. (135) by taking W= 0, i.e. a =0: 


(2.136) 


- the so-called identity matrix. However, we are free to choose the reference points x; 2 participating in 
Eq. (120) as we wish. For example, what if x2 — x; = a? Let us first take the forward-propagating wave 
alone: Bz = 0 (and hence B, = 0); then 


VW, =V,= Ae zi srw ye) (2.137) 


The comparison of this expression with the definition (120) for 7 = 2 shows that Az = A; exp {ik(x2 -x1)} 
= A, exp {ika}, 1.e. T\, = exp {ika}. Repeating the calculation for the back-propagating wave, we see that 
T22 = exp {-ika}, and since the space interval provides no particle reflection, we finally get 


(2.138) 


independently of a common shift of points x; and x2. At a = 0, we naturally recover the special case 
(136). 
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Now let us use these simple results to analyze the double-barrier system shown in Fig. 15. We 
could of course calculate its properties as before, writing down explicit expressions for all five traveling 
waves shown by arrows in Fig. 15, then using the boundary conditions (124) and (125) at each of points 
X12 to get a system of four linear equations, and finally, solving it for four amplitude ratios. 


W5(x —x,) W6(x — x,) 


Fig. 2.15. The double-barrier system. The 
dashed lines show (schematically) the quasi- 


levels of the metastable-state energies. 
x 


However, the transfer matrix approach simplifies the calculations, because we may immediately 
use Eqs. (132), (135), and (138) to write 


l-ia —-ia\(eK4 9 \(l-ia -ia 
ss he ae q (2.139) 
ia Il+tia 0 e@ 4 ia Il+ia 


Let me hope that the reader remembers the “row by column” rule of the multiplication of square 
matrices;?’ using it for the last two matrices, we may reduce Eq. (139) to 


17 -_ _ sy tka _ + ika 
P=) ee cles (2.140) 
ia 1+ia iae 1k4 (1+ ia)e tka 


Now there is no need to calculate all elements of the full product T, because, according to Eq. (128), for 
the calculation of barrier’s transparency T we need only one its element, 7): 


a (2.141) 


2 . ‘ 2: 
IZ. Jarre +(1-ia) elk 


This result is somewhat similar to that following from Eq. (71) for E > Up: the transparency is a 
m-periodic function of the product ka, reaching its maximum (¥ = 1) at some point of each period — see 
Fig. 16a. However, Eq. (141) is different in that for @ >> 1, the resonance peaks of the transparency are 
very narrow, reaching their maxima at ka = k,a =nz, with n = 1, 2, ... 


The physics of this resonant tunneling effect?® is the so-called constructive interference, 
absolutely similar to that of electromagnetic waves (for example, light) in a Fabry-Perot resonator 
formed by two parallel semi-transparent mirrors.*? Namely, the incident de Broglie wave may either 


N 
27 Tn an analytical form: (AB) = ba A 5B jy, where N is the matrix rank (in our current case, N = 2). 
j=l 
28 In older literature, it is sometimes called the Townsend (or “Ramsauer-Townsend”) effect. However, it is more 
common to use that term only for a similar effect at 3D scattering — to be discussed in Chapter 3. 
29 See, e.g., EM Sec. 7.9. 
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tunnel through the two barriers or undertake, on its way, several sequential reflections from these semi- 
transparent walls. At k = ky, i.e. at 2ka = 2k,a = 27m, the phase differences between all these partial 
waves are multiples of 27, so that they add up in phase — “constructively”. Note that the same 
constructive interference of numerous reflections from the walls may be used to interpret the standing- 
wave eigenfunctions (1.84), so that the resonant tunneling at ~ >> | may be also considered as a result 
of the incident wave’s resonance induction of such a standing wave, with a very large amplitude, in the 
space between the barriers, with the transmitted wave’s amplitude proportionately increased. 


(b) 


Fig. 2.16. Resonant tunneling through a 
potential well with delta-functional walls: 
(a) the system’s transparency as a 

1 15 > function of ka, and (b) calculating the 
kala resonance’s FWHM at a>> 1. 


0 0.5 


As a result of this resonance, the maximum transparency of the system is perfect (Smax = 1) even 
at @ — , Le. in the case of very /ow transparency of each of the two component barriers. Indeed, the 
denominator in Eq. (141) may be interpreted as the squared length of the difference between two 2D 
vectors, one of length a, and another of length rel - ia)’ | =1+ @&, with the angle 0 = 2ka + const 
between them — see Fig. 16b. At the resonance, the vectors are aligned, and their difference is smallest 
(equal to 1) so that Ynax = 1. (This result is exact only if the two barriers are exactly equal.) 


The same vector diagram may be used to calculate the so-called FWHM, a common acronym for 
the Full Width [of the resonance curve at its] Half-Maximum. By definition, this is the difference Ak = k+ 
— k. between such two values of k, on the opposite slopes of the same resonance, at that = Smax/2 — see 
the arrows in Fig. 16a. Let the vectors in Fig. 16b, drawn for @ >> 1, be misaligned by a small angle @ 
~ 1/&@ << 1, so that the length of the difference vector is much smaller than the length of each vector. To 
double its length squared, and hence to reduce ¥ by a factor of two in comparison with its maximum 
value 1, the arc between the vectors, equal to @ 0, should also become equal to +1, i.e. of (2ksa + const) 
= +1. Subtracting these two equalities from each other, we get 


Ak =k, —k. =—~ << k,. (2.142) 


Now let us use the simple system shown in Fig. 15 to discuss an issue of large conceptual 
significance. For that, consider what would happen if at some initial moment (say, t = 0) we have placed 
a 1D quantum particle inside the double-barrier well with @ >> 1, and left it there alone, without any 
incident wave. To simplify the analysis, let us assume that the initial state of the particle coincides with 
one of the stationary states of the infinite-wall well of the same size — see Eq. (1.84): 
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a) 1/2 
P(x,0) = v7, (x) = (=| sin[k,(x—x,)]| where k,=—, n=1,2..... (2.143) 
a a 
At a@— o, this is just an eigenstate of the system, and from our analysis in Sec. 1.5 we know the time 
evolution of its wavefunction: 
E, _ hk; 


1/2 
PO.) =w,Caperpt-so,t}=(2) sin[k,(x-x)Jexpt-i@,t}, witha, =S"=>%, 2.144) 
a m 


telling us that the particle remains in the well at all times with constant probability W(t) = W(0) = 1. 


However, if the parameter @ is large but finite, the de Broglie wave would slowly “leak out” 
from the well, so that W(t) would slowly decrease. Such a state is called metastable. Let us derive the 
law of its time evolution, assuming that at the slow leakage, with a characteristic time t >> 1/@,, does 
not affect the instant wave distribution inside the well, besides the gradual, slow reduction of W.3° Then 
we can generalize Eq. (144) as 


P(x,1) = (=) sin|k, (x — x,)]exp{-ia,t} = Aexp{i(k,x—a,t)}+ BLi(k,x+o,t)}, (2.145) 
a 


making the probability of finding the particle in the well equal to W < 1. As the last form of Eq. (145) 
shows, this function is the sum of two traveling waves, with equal magnitudes of their amplitudes and 
equal but opposite probability currents (5): 


1/2 
W hy, 2 
But we already know from Eq. (79) that at @ >> 1, the delta-functional wall’s transparency ¥ equals 


1/o?, so that the wave carrying current J4, incident on the right wall from the inside, induces an 
outcoming wave outside of the well (Fig. 17) with the following probability current: 


(2.147) 


Fig. 2.17. Metastable state’s decay in the simple model of a 1D potential well 
formed by two low-transparent walls — schematically. 


Absolutely similarly, 
1 
: gig ys (2.148) 


30 This virtually evident assumption finds its formal justification in the perturbation theory to be discussed in 
Chapter 6. 
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Now we may combine the 1D version (6) of the probability conservation law for the well’s interior: 


ee (2.149) 
with Eqs. (147)-(148) to write 
dw 1 anh 
—_— =-— W. 2.150 
dt a? ma* 
This is just the standard differential equation, 
Metastable 
state: Be ae (2.151) 
decay law dt t 


of the exponential decay, W(t) = W(0)exp {-t/r}, where the constant 7, in our case equal to 


2 
go a (2.152) 
mmh 


is called the metastable state’s lifetime. Using Eq. (2.33b) for the de Broglie waves’ group velocity, for 
our particular wave vector giving Ver = hik,/m = anh/ma, Eq. (152) may be rewritten in a more general 


form, 
Metastable t 
state: t=—, (2.153) 
lifetime SF 


where the attempt time t, is equal to a/vg;, and (in our particular case) Y= 1/ of, in which it is valid for a 
broad class of similar metastable systems.3! Equality may be interpreted in the following semi-classical 
way. The particle travels back and forth between the confining potential barriers, with the time interval 
t, between the sequential moments of incidence, each time attempting to leak through the wall, with the 
success probability equal to Y so the reduction of W per each incidence is AW = —W7Y, in the limit a@ >> 
1 (1.e. H<< 1) immediately leading to the decay equation (151) with the lifetime (153). 


Another useful look at Eq. (152) may be taken by returning to the resonant tunneling problem in 
the same system, and expressing the resonance width (142) in terms of the incident particle’s energy: 


27,2 2k 2k 2 
pen = ee ee (2.154) 
2m m m aa maa 
Comparing Eqs. (152) and (154), we get a remarkably simple, parameter-independent formula?? 
Energy-time 
uncertainty AE -t=h. (2.155) 
relation 


3! Essentially the only requirement is to have the attempt time Af, to be much longer than the effective time (the 
instanton time, see Sec. 5.3 below) of tunneling through the barrier. In the delta-functional approximation for the 
barrier, the latter time is equal to zero, so that this requirement is always fulfilled. 

32 Note that the metastable state’s decay (2.151) may be formally obtained from the basic Schrédinger equation 
(1.61) by adding an imaginary part, equal to (-AE/2), to its eigenenergy F,,. Indeed, in this case Eq. (1.62) 
becomes a,(t) = constxexp {-i(E, — iAE/2}t/h} = constxexp {-iE,t/h}xexp {-AEt/2h} = constxexp {-iE,,t/h} xexp {- 
t/2t, so that W(t) « |a,(t)/° oc exp {-t/z}. Such formalism, which hides the physical origin of the state’s decay, 
may be convenient for some calculations, but misleading in other cases, and I will not use it in this course. 
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This energy-time uncertainty relation is certainly more general than our simple model; for 
example, it is valid for the lifetime and resonance tunneling width of any metastable state in the 
potential profile of any shape. This seems very natural, since because of the energy identification with 
frequency, E' = ha, typical for quantum mechanics, Eq. (155) may be rewritten as Aw-7= 1 and seems to 
follow directly from the Fourier transform in time, just as the Heisenberg’s uncertainty relation (1.35) 
follows from the Fourier transform in space. In some cases, these two relations are indeed 
interchangeable; for example, Eq. (24) for the Gaussian wave packet width may be rewritten as 6E-At = 
h, where 0E = h(da/dk) ok = hive, ok is the r.m.s. spread of energies of monochromatic components of the 
packet, while At = dx/Vo, is the time scale of packet’s passage through a fixed observation point x. 


However, Eq. (155) it is much /ess general than Heisenberg’s uncertainty relation (1.35). Indeed, 
in the non-relativistic quantum mechanics we are studying now, the Cartesian coordinates of a particle, 
the Cartesian components of its momentum, and the energy E are regular observables, represented by 
operators. In contrast, time is treated as a c-number argument, and is not represented by an operator, so 
that Eq. (155) cannot be derived in such general assumptions as Eq. (1.35). Thus the time-energy 
uncertainty relation should be used with caution. Unfortunately, not everybody is so careful. One can 
find, for example, wrong claims that due to this relation, the energy dissipated by any system performing 
an elementary (single-bit) calculation during a time interval Ar has to be larger than fi/At.*? Another 
incorrect statement is that the energy of a system cannot be measured, during a time interval Af, with an 


accuracy better than fi/At.*4 


Now that we have a quantitative mathematical description of the metastable state’s decay (valid, 
again, only if a@>> 1, 1.e. if 7 >> ¢t,), we may use it for discussion of two important conceptual issues of 
quantum mechanics. First, this is one of the simplest examples of systems that may be considered, from 
two different points of view, as either Hamiltonian (and hence time-reversible), or open (and hence 
irreversible). Indeed, from the former point of view, our particular system is certainly described by a 
time-independent Hamiltonian of the type (1.41), with the potential energy 


U(x)= w(x — x, )+ 6(x-x,)] (2.156) 


- see Fig. 15 again. In this point of view, the total probability of finding the particle somewhere on the 
axis x remains equal to 1, and the full system’s energy, calculated from Eq. (1.23), 


(E\= foal W(x,t)d>x, (2.157) 


remains constant and completely definite (OE = 0). On the other hand, since the “emitted” wave packets 
would never return to the potential well,3> it makes sense to look at the well’s region alone. For such a 


33 On this issue, I dare to refer the reader to my own old work K. Likharev, Jnt. J. Theor. Phys. 21, 311 (1982), 
which provided a constructive proof (for a particular system) that at reversible computation, whose idea had been 
put forward in 1973 by C. Bennett (see, e.g., SM Sec. 2.3), energy dissipation may be lower than this apparent 
“quantum limit”. 

34 See, e.g., a discussion of this issue in the monograph by V. Braginsky and F. Khalili, Quantum Measurement, 
Cambridge U. Press, 1992. 

35 For more realistic 2D and 3D systems, this statement is true even if the system as a whole is confined inside 
some closed volume, much larger than the potential well housing the metastable states. Indeed, if the walls 
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truncated, open system (for which the space beyond the interval [x;, x2] serves as its environment), the 
probability W of finding the particle inside this interval, and hence its energy (FE) = WE,, decay 
exponentially per Eq. (151) — the decay equation typical for irreversible systems. We will return to the 
discussion of the dynamics of such open quantum systems in Chapter 7. 


Second, the same model enables a preliminary discussion of one important aspect of quantum 
measurements. As Eq. (151) and Fig. 17 show, at t >> 7, the well becomes virtually empty (W ~ 0), and 
the whole probability is localized in two clearly separated wave packets with equal amplitudes, moving 
from each other with the speed vs, each “carrying the particle away” with a probability of 50%. Now 
assume that an experiment has detected the particle on the left side of the well. Though the formalisms 
suitable for quantitative analysis of the detection process will not be discussed until Chapter 7, due to 
the wide separation Ax = 2Vgt >> 2Vet of the packets, we may safely assume that such detection may be 
done without any actual physical effect on the counterpart wave packet.3° But if we know that the 
particle has been found on the left side, there is no chance to find it on the right side. If we attributed the 
full wavefunction to all stages of this particular experiment, this situation might be rather confusing. 
Indeed, that would mean that the wavefunction at the right packet’s location should instantly turn into 
zero — the so-called wave packet reduction (or “collapse”) — a hypothetical, irreversible process that 
cannot be described by the Schrédinger equation for this system, even including the particle detectors. 


However, if (as was already discussed in Sec. 1.3) we attribute the wavefunction to a certain 
statistical ensemble of similar experiments, there is no need to involve such an artificial notion. The 
two-packet picture we have calculated (Fig. 17) describes the full ensemble of experiments with all 
systems prepared in the initial state (143), i.e. does not depend on the particle detection results. On the 
other hand, the “reduced packet” picture (with no wave packet on the right of the well) describes only a 
sub-ensemble of such experiments, in which the particles have been detected on the left side. As was 
discussed on classical examples in Sec. 1.3, for such redefined ensemble the probability distribution is 
rather different. So, the “wave packet reduction” is just a result of a purely accounting decision of the 
observer.3” I will return to this important discussion in Sec. 10.1 — on the basis of the forthcoming 
discussion of open systems in Chapters 7 and 8. 


2.6. Localized state coupling, and quantum oscillations 


Now let us discuss one more effect specific to quantum mechanics. Its mathematical description 
may be simplified using a model potential consisting of two very short and deep potential wells. For 
that, let us first analyze the properties of a single well of this type (Fig. 18), which may be modeled 
similarly to the short and high potential barrier — see Eq. (74), but with a negative “weight”: 


U(x)=—w6(x), with w>0. (2.158) 


providing such confinement are even slightly uneven, the emitted plane-wave packets will be reflected from them, 
but would never return to the well intact. (See SM Sec. 2.1 for a more detailed discussion of this issue.) 

36 This argument is especially convincing if the particle’s detection time is much shorter than the time ¢, = 2v,t/c, 
where c is the speed of light in vacuum, 1.e. the maximum velocity of any information transfer. 

37 “The collapse of the wavefunction after measurement represents nothing more than the updating of that 
scientist’s expectations.” N. D. Mermin, Phys. Today, 72, 53 (Jan. 2013). 
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In contrast to its tunnel-barrier counterpart (74), such potential sustains a stationary state with a negative 
eigenenergy EF < 0, and a Jocalized eigenfunction y, with | y|— 0 at x > +o. 


0 | U(x) = -Wo(x) 


E<0 Fig. 2.18. Delta-functional 
potential well and its localized 
eigenstate (schematically). 


Indeed, at x # 0, U(x) = 0, so the 1D Schrédinger equation is reduced to the Helmholtz equation 
(1.83), whose localized solutions with E < 0 are single exponents, vanishing at large distances:38 


Ae™, forx>0 hn? 
x)=y,(x ° ° with =-E, «>0. 2.159 

y (x) = w(x) ee ae on (2.159) 

(The coefficients before the exponents have been selected equal to satisfy the boundary condition (76) 

of the wavefunction’s continuity at x = 0.) Plugging Eq. (159) into the second boundary condition, given 


by Eq. (75), but now with the negative sign before W, we get 


2nw 
(-A)—(+ x) =-= A, (2.160) 
in which the common factor A # 0 may be canceled. This equation? has one solution for any W> 0: 
mw 
K=K,= ae (2.161) 
and hence the system has only one (ground) localized state, with the following eigenenergy:*° 
i mw? 
E=E,= i= : 2.162 
° 2m 2h? ie 


Now we are ready to analyze localized states of the two-well potential shown in Fig. 19: 
U(x) =-w a = <| # als $ al with w>0. (2.163) 


Here we may still use the single-exponent solutions, similar to Eq. (159), for the wavefunction outside 
the interval [-a/2, +a/2], but inside the interval, we need to take into account both possible exponents: 


w=C,e“+Ce™ =C, sinhax+C,cosha, for 3 Sat), (2.164) 


38 See Eqs. (56)-(58), with Up = 0. 

39 Such algebraic equations for linear differential equations are frequently called characteristic. 

40 Note that this E is equal, by magnitude, to the constant Ep that participates in Eq. (79). Note also that this result 
was actually already obtained, “backward”, in the solution of Problem 1.12(ii), but that solution did not address 
the issue of whether the calculated potential (158) could sustain any other localized eigenstates. 
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with the parameter « defined as in Eq. (159). The last of these equivalent expressions is more 
convenient because due to the symmetry of the potential (163) to the central point x = 0, the system’s 
eigenfunctions should be either symmetric (even) or antisymmetric (odd) functions of x (see Fig. 19), so 
that they may be analyzed separately, only for one half of the system, say x = 0, and using just one of the 
hyperbolic function (164) in each case. 


Fig. 2.19. A system of two coupled 
potential wells, and its localized 
eigenstates (schematically). 


For the antisymmetric eigenfunction, Eqs. (159) and (164) yield 


sinh x, for0<x<5, 


y, =C, x (2.165) 
sinh au exp, — kK} x = : for =e x, 
2 2 2 


where the front coefficient in the lower line has been selected to satisfy the condition (76) of the 
wavefunction’s continuity at x = +a/2 — and hence at x = —a/2. What remains is to satisfy the condition 
(75), with a negative sign before W, for the derivative’s jump at that point. This condition yields the 
following characteristic equation: 


2nw 


hw 


sinh . . ie d+ coth _ 9 (Koa) (2.166) 


(xa) ° 


where xo, given by Eq. (161), is the value of « for a single well, i.e. the reciprocal spatial width of its 
localized eigenfunction — see Fig. 18. 


sinh + cosh Me 
2 2 


Figure 20a shows both sides of Eq. (166) as functions of the dimensionless product «a, for 
several values of the parameter «oa, i.e. of the normalized distance between the two wells. The plots 
show, first of all, that as the parameter «oa is decreased, the LHS and RHS plots cross (i.e. Eq. (166) has 
a solution) at lower and lower values of xa. At xa << 1, the left-hand side of the last form of this 
equation may be approximated as 2/«Ka. Comparing this expression with the right-hand side of the 
characteristic equation, we see that this transcendental equation has a solution (i.e. the system has an 
antisymmetric localized state) only if aja > 1, 1.¢e. if the distance a between the two narrow potential 
wells is larger than the following value, 

1 hi? 


Qa... =—_—= 5 2.167 
Ky mw ( ) 


which is equal to the characteristic spread of the wavefunction in a single well — see Fig. 18. (At a > 
Amin, Ka —> 0, meaning that the state’s localization becomes weaker and weaker.) 
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YrWs . 


LHS (166) 
2 


SS 
i RHS [oo —S) RHS 


Fig. 2.20. Graphical solutions of the characteristic equations of the two-well system, for: 
(a) the antisymmetric eigenstate (165), and (b) the symmetric eigenstate (171). 


0 1 2 
ka 


In the opposite limit of large distances between the potential wells, i.e. Aja >> 1, Eq. (166) 
shows that xa >> 1 as well, so that its left-hand side may be approximated as 2(1 + exp {—«a}), and the 
equation yields 

K® Ky (l—exp{t-_ma})>«, . (2.168) 


This result means that the eigenfunction is an antisymmetric superposition of two virtually unperturbed 
wavefunctions (159) of each partial potential well: 


y (x)= se (x)—w.(x)} where yx (x)= vol - ) Vi (x)=V [» + <| , (2.169) 


and the front coefficient is selected in such a way that if the eigenfunction yo of each well is normalized, 
so is wa. Plugging the middle (more exact) form of Eq. (168) into the last of Eqs. (159), we can see that 
in this limit the antisymmetric state’s energy is only slightly higher than the eigenenergy Eo of a single 
well, given by Eq. (162): 

2 


E, ~ E,(1-2exp{-«,a})= E, +6, where 6 = = exp{-«,a}>0. (2.170) 


2 


The symmetric eigenfunction has a form reminding Eq. (165), but still different from it: 


cosh x, for0sxs°, 
y=W,=C,x - P 4 (2.171) 
cosh — exp x ) : for —< x, 
2 2 2 


giving a characteristic equation similar in structure to Eq. (166), but with a different left-hand side: 


1+ tanh _ (Koa) (2,172) 


(xa) 


Figure 20b shows both sides of this equation for several values of the parameter «oa. It is evident that in 
contrast to Eq. (166), Eq. (172) has a unique solution (and hence the system has a localized symmetric 
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eigenstate) for any value of the parameter «oa, i.e. for any distance between the partial wells. In the limit 
of very close wells (i.e. their strong coupling), Asa << 1, we get xa << 1, tanh(xa/2) — 0, and Eq. (172) 
yields « — 2x0, leading to a four-fold increase of the eigenenergy’s magnitude in comparison with that 
of the single well: 
m(2W)y? 
2n° 


The physical meaning of this result is very simple: two very close potential wells act (on the symmetric 
eigenfunction only!) together, so that their “weights” W = /U(x)dx just add up. 


E, x 4E, = F for Ka <<1. (2.173) 


In the opposite, weak coupling limit, i.e. Aja >> 1, Eq. (172) shows that xa >> 1 as well, so that 
its left-hand side may be approximated as 2(1 — exp {—«a}), and the equation yields 


K ® K, (I+ exp{-K,a}) © Kp. (2.174) 


In this limit, the eigenfunction is a symmetric superposition of two virtually unperturbed wavefunctions 
(159) of each partial potential well: 


vse)=s5l ety, 00), (2.175) 


and the eigenenergy is also close to the energy £o of a partial well, but is slightly lower: 
E, x E,(1+2exp{t-«,a})=E,-6,  sothat E, —E, =26, (2.176) 
where is again given by the last of Eqs. (170). 


So, the eigenenergy of the symmetric state is always lower than that of the antisymmetric state. 
The physics of this effect (which remains qualitatively the same in more complex two-component 
systems, most importantly in diatomic molecules such as Hz) is evident from the sketch of the 
wavefunctions wa and yz, given by Eqs. (165) and (171), in Fig. 19. In the antisymmetric mode, the 
wavefunction has to vanish at the center of the system, so that each its half is squeezed to one half of the 
system’s spatial extension. Such a squeeze increases the function’s gradient, and hence its kinetic 
energy (1.27), and hence its total energy. On the contrary, in the symmetric mode, the wavefunction 
effectively spreads into the counterpart well. As a result, it changes in space slower, and hence its 
kinetic energy is also lower. 


Even more importantly, the symmetric state’s energy decreases as the distance a is decreased, 
corresponding to the effective attraction of the partial wells. This is a good toy model of the strongest 
(and most important) type of atomic cohesion — the covalent (or “chemical”) bonding.*! In the simplest 
case of the H2 molecule, each of two electrons of the system, in its ground state,4? reduces its kinetic 
energy by spreading its wavefunction around both hydrogen nuclei (protons), rather than being confined 
near one of them — as it had to be in a single atom. The resulting bonding is very strong: in chemical 
units, 429 kJ/mol, i.e. 18.6 eV per molecule. Perhaps counter-intuitively, this quantum-mechanical 


41 Historically, the development of the quantum theory of such bonding in the H, molecule (by Walter Heinrich 
Heitler and Fritz Wolfgang London in 1927) was the breakthrough decisive for the acceptance of the then- 
emerging quantum mechanics by the community of chemists. 

42 Due to the opposite spins of these electrons, the Pauli principle allows them to be in the same orbital ground 
state — see Chapter 8. 
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covalent bonding is even stronger than the strongest classical (ionic) bonding due to electron transfer 
between atoms, leading to the Coulomb attraction of the resulting ions. (For example, the atomic 
cohesion in the NaCl molecule is just 3.28 eV.) 


Now let us analyze the dynamic properties of our model system (Fig. 19) because such a pair of 
weakly coupled potential wells is our first example of the very important class of two-level systems.*? It 
is easiest to do in the weak-coupling limit «9a >> 1, when the simple results (168)-(170) and (174)-(176) 
are quantitatively valid. In particular, Eqs. (169) and (175) enable us to represent the quasi-localized 
states of the particle in each partial well as linear combinations of its two eigenstates: 


1 1 
x)=—= x)+ x), x)=— x)- eae (2.177) 
Wp (x) avs) y »( )I y (x) vs) y »( )I 
Let us perform the following thought (“gedanken”’) experiment: place a particle, at ¢ = 0, into one of 
these quasi-localized states, say we(x), and leave the system alone to evolve, so that 
1 
¥(x,0) =v, (x) = = ys) +¥4()]. (2.178) 
V2 
According to the general solution (1.69) of the time-independent Schrédinger equation, the time 
dynamics of this wavefunction may be obtained simply by multiplying each eigenfunction by the 
corresponding complex-exponential time factor: 


1 E lt 
Viei= as |v. (x) exp i “1 +w, (x) exp i ‘ (2.179) 
From here, using Eqs. (170) and (176), and then Eqs. (169) and (175) again, we get 
YQ) a W(x) exp +W, (x) exp|- a exp — Hah 
V2 i i hi 
(2.180) 
-( (x) wore ag (x) sin oJex pu 
=|We h WY. i p m1 


This result implies, in particular, that the probabilities We and W, to find the particle, respectively, in 
the right and left wells change with time as 


(2.181) 


mercifully leaving the total probability constant: Wp + W, = 1. (If our calculation had not passed this 
sanity check, we would be in big trouble.) 


This is the famous effect of quantum oscillations“ of the particle’s wavefunction between two 
similar, coupled subsystems, with the frequency 


43 As we will see later in Chapter 4, these properties are similar to those of spin-% particles; hence two-level 
systems are frequently called the spin-/2-like systems. 

44 Sometimes they are called the Bloch oscillations, but more commonly the last term is reserved for a related but 
different effect in spatially-periodic systems — to be discussed in Sec. 8 below. 
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25 _ E,-Eg 


2.182 
; ; (2.182) 


In its last form, this result does not depend on the assumption of weak coupling, though the simple form 
(181) of the oscillations, with its 100% probability variations, does. (Indeed, at a strong coupling of two 
subsystems, the very notion of the quasi-localized states we and y, is ambiguous.) Qualitatively, this 
effect may be interpreted as follows: the particle, placed into one of the potential wells, tries to escape 
from it via tunneling through the potential barrier separating the wells. (In our particular system, shown 
in Fig. 17, the barrier is formed by the spatial segment of length a, which has the potential energy, U = 
0, higher than the eigenstate energy —Eo.) However, in the two-well system, the particle can only escape 
into the adjacent well. After the tunneling into that counterpart well, the particle tries to escape from it, 
and hence comes back, etc. — very much as a classical 1D oscillator, initially deflected from its 
equilibrium position, at negligible damping. 


Some care is required at using such interpretation for quantitative conclusions. In particular, let 
us compare the period 7 = 27/a@ of the oscillations (181) with the metastable state’s lifetime discussed in 
the previous section. For our particular model, we may use the second of Eqs. (170) to write 

4] E,| mh = mh t 
@ = ——exp}-K,a;, 7 =—=——expik, aj = exp, aj, for K,a>>1, (2.183 
i Pt Koa} 5 2\E,| PUK} = Sexy} 0 ( ) 
where ta = 27/@ = 27h/|Eo| is the effective attempt time. On the other hand, according to Eq. (80), the 
transparency Y of our potential barrier, in this limit, scales as exp{-2%pa},*> so that according to the 
general relation (153), the lifetime rz is of the order of t,exp{2aoa} >> 7 This is a rather counter- 


intuitive result: the speed of particle tunneling into a similar adjacent well is much higher than that, 
through a similar barrier, to the free space! 


In order to show that this important result is not an artifact of our delta-functional model of the 
potential barrier, and also compare 7 and 7 more directly, let us analyze the quantum oscillations 


between two weakly coupled wells, now assuming that the (symmetric) potential profile U(x) is 
sufficiently soft (Fig. 21), so that all its eigenfunctions ws and wa are at least differentiable at all 
points.*¢ If the barrier’s transparency is low, the quasi-localized wavefunctions wa(x) and w(x) = wr(-x) 
and their eigenenergies may be found approximately by solving the Schrédinger equations in one of the 
wells, neglecting the tunneling through the barrier, but the calculation of d requires a little bit more care. 
Let us write the stationary Schrédinger equations for the symmetric and antisymmetric solutions in the 
form 
2 42 2 92 
! UK (pense. eee ae 
2m dx 


(2.184) 


45 Tt is hard to use Eq. (80) for a more exact evaluation of Y in our current system, with its infinitely deep 
potential wells, because the meaning of the wave number & is not quite clear. However, this is not too important, 
because in the limit Aja >> 1, the tunneling exponent makes the dominant contribution into the transparency — 
see, again, Fig. 2.7b. 

46 Such a smooth well may have more than one quasi-localized eigenstate, so that the proper state (and energy) 
index n is implied in all remaining formulas of this section. 
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multiply the former equation by ys and the latter one by wa, subtract them from each other, and then 
integrate the result from 0 to 00. The result is 


ea) h? 00 dys dy, 
EB) va nae * Uf ae Wa re Ws \dx. (2.185) 


If U(x), and hence d” wasldx’, are finite for all x, we may integrate the right-hand side by parts to get 


f _W (dy, dW ° 
(E,-E)fvovnde=F a eS (2.186) 


Fig. 2.21. Weak coupling between two 
similar, soft potential wells. 


So far, this result is exact (provided that the derivatives participating in it are finite at each 
point); for weakly coupled wells, it may be further simplified. Indeed, in this case, the left-hand side of 
Eq. (186) may be approximated as 

t E,-E 
(Ex —Es)|WsW dx * ——— 
0 


2 


=6, (2.187) 


because this integral is dominated by the vicinity of point x = a/2, where the second terms in each of 
Eqs. (169) and (175) are negligible, and the integral is equal to 2, assuming the proper normalization of 
the function we(x). On the right-hand side of Eq. (186), the substitution at x = oo vanishes (due to the 
wavefunction’s decay in the classically forbidden region), and so does the first term at x = 0, because for 
the antisymmetric solution, ya(0) = 0. As a result, the energy half-split 6may be expressed in any of the 
following (equivalent) forms: 
2 2 2 

5-9 yy, (4 @-2y, 0X2 @=-Hy, 0% 

2m dx m dx m dx 


(0). (2.188) 


It is straightforward (and hence left for the reader’s exercise) to show that within the limits of the 
WKB approximation’s validity, Eq. (188) may be reduced to 


’ ' 


x 


x 
h r mh tt F 
0 =—exps— | K(x')dx'>, so that 7 =—=—expy | K(x")dx'?, 2.189 
exp J (x') ra P| (x') (2.189) 


a 
Cc c 


where ¢, is the time period of the classical motion of the particle, with the energy EF ~ E, ~ Es, inside 
each well, the function «(x) is defined by Eq. (82), and x, and x.’ are the classical turning points limiting 
the potential barrier at the level E of the particle’s eigenenergy — see Fig. 21. The result (189) is 
evidently a natural generalization of Eq. (183), so that the strong relationship between the times of 
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particle tunneling into the continuum of states and into a discrete eigenstate, is indeed not specific for 
the delta-functional model. We will return to this fact, in its more general form, at the end of Chapter 6. 


2.7. Periodic systems: Energy bands and gaps 


Let us now proceed to the discussion of one of the most important issues of wave mechanics: 
particle motion through a periodic system. As a precursor to this discussion, let us calculate the 
transparency of the potential profile shown in Fig. 22 (frequently called the Dirac comb): a sequence of 
N similar, equidistant delta-functional potential barriers, separated by (NV — 1) potential-free intervals a. 


Fig. 2.22. Tunneling through a 
Dirac comb: a system of N similar, 
equidistant barriers, ic. (NV — 1) 
X, Xo .. ‘- x similar coupled potential wells. 


According to Eq. (132), its transfer matrix is the following product 


aa” a°’**~a”-a? 


(N-1)+N_ terms 


fr 07 (2.190) 
Ue“ 


with the component matrices given by Eqs. (135) and (138), and the barrier height parameter @ defined 
by the last of Eqs. (78). Remarkably, this multiplication may be carried out analytically,*” giving 
-1 


: 2 
G =|7,\° = cosa v[ Saket nna] | ; (2.191a) 
sin ga 


where q is a new parameter, with the wave number dimensionality, defined by the following relation: 


cosga = coska+asinka. (2.191b) 


For N = 1, Eqs. (191) immediately yield our old result (79), while for N = 2 they may be readily reduced 
to Eq. (141) — see Fig. 16a. Fig. 20 shows its predictions for two larger numbers N, and several values of 
the dimensionless parameter a. 


Let us start the discussion of the plots from the case N = 3, when three barriers limit two coupled 
potential wells between them. Comparison of Fig. 23a and Fig. 16a shows that the transmission patterns, 
and their dependence on the parameter a, are very similar, besides that in the coupled-well system, each 
resonant tunneling peak splits into two, with the ka-difference between them scaling as 1/a@. From the 
discussion in the last section, we may now readily interpret this result: each pair of resonance peaks of 
transparency corresponds to the alignment of the incident particle’s energy E with the pair of energy 
levels Ey, Es of the symmetric and antisymmetric states of the system. However, in contrast to the 


47 This formula will be easier to prove after we have discussed the properties of Pauli matrices in Chapter 4. 
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system shown in Fig. 19, these states are metastable, because the particle may leak out from these states 
just as it could in the system studied in Sec. 5 — see Fig. 15 and its discussion. As a result, each of the 


resonant peaks has a non-zero energy width AE, obeying Eq. (155). 
KY if 
0.6 0 


8 


(b) 


itt 
tt} 
| 


0.4 0.6 0 0.2 0.4 
kala kala 


Fig. 2.23. The Dirac comb’s transparency as a function of the product ka for three values of @. Since 
the function Aka) is z-periodic (just like it is for N= 2, see Fig. 16a), only one period is shown. 


A further increase of N (see Fig. 23b) results in the increase of the number of resonant peaks per 
period to (VN — 1), and at N > o the peaks merge into the so-called allowed energy bands (frequently 
called just the “energy bands”) with average transparency Y ~ 1, separated from similar bands in the 
adjacent periods of function Y (ka) by energy gaps*® where Y — 0. Notice the following important 
features of the pattern: 


(i) at N — o, the band/gap edges become sharp for any a, and tend to fixed positions 
(determined by @ but independent of N); 

(11) the larger is well coupling (the smaller is @), the broader are the allowed energy bands and 
the narrower are the gaps between them. 


Our previous discussion of the resonant tunneling gives us a clue for a semi-quantitative 
interpretation of this pattern: if (NV — 1) potential wells are weakly coupled by tunneling through the 
potential barriers separating them, the system’s energy spectrum consists of groups of (V — 1) metastable 
energy levels, each group being close to one of the unperturbed eigenenergies of the well. (According to 
Eq. (1.84), for our current example shown in Fig. 22, with its rectangular potential wells, these 
eigenenergies correspond to k,a = 7m.) 


Now let us recall that in the case N = 2, analyzed in the previous section, the eigenfunctions 
(169) and (175) differed only by the phase shift Ag between their localized components we(x) and 
y(x), with Ag= 0 for one of them (ys) and Ag= z for its counterpart. Hence it is natural to expect that 
for other N as well, each metastable energy level corresponds to an eigenfunction that is a superposition 
of similar localized functions in each potential well, but with certain phase shifts Ag between them. 


Moreover, we may expect that at N — ©, i.e. for periodic structures,*? with 


48 In solid-state (especially semiconductor) physics and electronics, the term bandgaps is more common. 
49 This is a reasonable 1D model, for example, for solid-state crystals, whose samples may feature up to ~10° 
similar atoms or molecules in each direction of the crystal lattice. 
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U(x+a) =U(x), (2.192) 


when the system does not have the ends that could affect its properties, the phase shifts Ag between the 
localized wavefunctions in all couples of adjacent potential wells should be equal, i.e. 


w(x+a)=y(x)en? (2.193a) 


for all x.5° This equality is the (1D version of the) much-celebrated Bloch theorem.>! Mathematical rigor 
aside,>? it is a virtually evident fact because the particle’s density w(x) = w*(x)y(x), which has to be 
periodic in this a-periodic system, may be so only Ag is constant. For what follows, it is more 
convenient to represent the real constant Ag in the form qa, so that the Bloch theorem takes the form 


w(x+a)=y (xe. (2.193b) 


The physical sense of the parameter q will be discussed in detail below, but we may immediately notice 
that according to Eq. (193b), an addition of (27/a) to this parameter yields the same wavefunction; 
hence all observables have to be (27/a)-periodic functions of g.*4 


Now let us use the Bloch theorem to calculate the eigenfunctions and eigenenergies for the 
infinite version of the system shown in Fig. 22, i.e. for an infinite set of delta-functional potential 
barriers — see Fig. 24. 


<—_ or mo > 
Ey | 
Fig. 2.24. The simplest periodic potential: 
an infinite Dirac comb. 
eco A uN eco x 
x; X 4 


To start, let us consider two points separated by one period a: one of them, x;, just left of the 
position of one of the barriers, and another one, x;+1, just left of the following barrier — see Fig. 24 again. 


50 A reasonably fair classical image of Ag is the geometric angle between similar objects — e.g., similar paper 
clips — attached at equal distances to a long, uniform rubber band. If the band’s ends are twisted, the twist is 
equally distributed between the structure’s periods, representing the constancy of Ag. (I have to confess that, due 
to the lack of time, this was the only “lecture demonstration” in my Stony Brook QM courses.) 

5! Named after F. Bloch who applied this concept to the wave mechanics in 1929, i.e. very soon after its 
formulation. Note, however, that an equivalent statement in mathematics, called the Floquet theorem, has been 
known since at least 1883. 

52 J will recover this rigor in two steps. Later in this section, we will see that the function obeying Eq. (193) is 
indeed a solution to the Schrédinger equation. However, to save time/space, it will be better for us to postpone 
until Chapter 4 the proof that any eigenfunction of the equation, with periodic boundary conditions, obeys the 
Bloch theorem. As a partial reward for this delay, that proof will be valid for an arbitrary spatial dimensionality. 

53 The product fig, which has the dimensionality of linear momentum, is called either the quasimomentum or 
(especially in solid-state physics) the “crystal momentum” of the particle. Informally, it is very convenient (and 
common) to use the name “quasimomentum” for the bare g as well, despite its evidently different dimensionality. 
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The eigenfunctions at each of the points may be represented as linear superpositions of two simple 
waves exp {+ikx}, and the amplitudes of their components should be related by a 2x2 transfer matrix T 
of the potential fragment separating them. According to Eq. (132), this matrix may be found as the 
product of the matrix (135) of one delta-functional barrier by the matrix (138) of one zero-potential 


interval a: 
Ae pag (he One te (2.194) 
Bin B, 0 etka ia l+ia)B, 


However, according to the Bloch theorem (193b), the component amplitudes should be also related as 


Aj, — 2'f4 A, = gia 0 A, ; (2.195) 
Biss B; 0 ff 8, 


The condition of self-consistency of these two equations gives the following characteristic equation: 


ika 1-i oF iqa 
e Oe ee ee Oe Neat (2.196) 
0 etka ll ia I1+ia 0 e4 


In Sec. 5, we have already calculated the matrix product participating in this equation — see the 
second operand in Eq. (140). Using it, we see that Eq. (196) is reduced to the same simple Eq. (191b) 
that has jumped at us from the solution of the somewhat different (resonant tunneling) problem. Let us 
explore that simple result in detail. First of all, the left-hand side of Eq. (191b) is a sinusoidal function of 
the product ga with unit amplitude, while its right-hand side is a sinusoidal function of the product ka, 
with amplitude (1 + a@)!”> 1 — see Fig. 25, 


gap gap 
mo re 


Fig. 2.25. The graphical representation of the 
characteristic equation (191b) for a fixed value of the 
parameter a. The ranges of ka that yield |cos gal < 1, 
correspond to allowed energy bands, while those with 
|cos ga| > 1, correspond to energy gaps between them. 


kala 


As a result, within each half-period A(ka) = z of the right-hand side, there is an interval where 
the magnitude of the right-hand side is larger than 1, so that the characteristic equation does not have a 
real solution for g. These intervals correspond to the energy gaps (see Fig. 23 again), while the 
complementary intervals of ka, where a real solution for g exists, correspond to the allowed energy 
bands. In contrast, the parameter g can take any real values, so it is more convenient to plot the 
eigenenergy EF = i’k’/2m as the function of the quasimomentum fg (or, even more conveniently, of the 
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dimensionless parameter ga) rather than ka.*4 Before doing that, we need to recall that the parameter a, 
defined by the last of Eqs. (78), depends on the wave vector k as well, so that if we vary g (and hence k), 
it is better to characterize the structure by another, k-independent dimensionless parameter, for example 


Ww 
B = (ka)a=——_., (2.197) 
h°/ma 
so that our characteristic equation (191b) becomes 
Di b: : sin ka 
— penn " cosga =coska+ B a (2.198) 


Fig. 26 shows the plots of & and E£, following from Eq. (198), as functions of ga, for a particular, 
moderate value of the parameter 7. The first evident feature of the pattern is the 27-periodicity of the 
pattern in the argument ga, which we have already predicted from the general Bloch theorem arguments. 
(Due to this periodicity, the complete band/gap pattern may be studied, for example, on just one interval 
—m <qa<+7, called the 1” Brillouin zone — the so-called reduced zone picture. For some applications, 
however, it is more convenient to use the extended zone picture with —0 < qa < +o — see, e.g., the next 


section.) 
1* Brillouin zone (a) 1“ Brillouin zone (b) 
<—_——————__> <—__————_—_> 
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Fig. 2.26. (a) The “genuine” momentum & of a particle in an infinite Dirac comb (Fig. 24), and (b) its 
energy E = f’k’/2m (in the units of Ey = f’/2ma’), as functions of normalized quasimomentum, for a 
particular value ({ = 3) of the dimensionless parameter defined by Eq. (197). Arrows in the lower right 
corner of panel (b) illustrate the definition of energy band (AZ,,) and energy gap (A,,) widths. 


54 A more important reason for taking g as the argument is that for a general periodic potential U(x), the particle’s 
momentum fk is not uniquely related to E, while (according to the Bloch theorem) the quasimomentum jg is. 
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However, maybe the most important fact, clearly visible in Fig. 26, is that there is an infinite 
number of energy bands, with different energies E,(q) for the same value of g. Mathematically, it is 
evident from Eq. (198) — or alternatively from Fig. 25. Indeed, for each value of qa, there is a solution 
ka to this equation on each half-period A(ka) = z. Each of such solutions (see Fig. 26a) gives a specific 
value of particle’s energy E = f’k’/2m. A continuous set of similar solutions for various ga forms a 
particular energy band. 


Since the energy band picture is one of the most practically important results of quantum 
mechanics, it is imperative to understand its physics. It is natural to describe this physics in two different 
ways in two opposite potential strength limits. In parallel, we will use this discussion to obtain simpler 
expressions for the energy band/gap structure in each limit. An important advantage of this approach is 
that both analyses may be carried out for an arbitrary periodic potential U(x) rather than for the 
particular model shown in Fig. 24. 


(1) Zight-binding approximation. This approximation works well when the eigenenergy E£,, of the 
states quasi-localized at the energy profile minima is much lower than the height of the potential barriers 
separating them — see Fig. 27. As should be clear from our discussion in Sec. 6, essentially the only role 
of coupling between these states (via tunneling through the potential barriers separating the minima) is 
to establish a certain phase shift Ag = ga between the adjacent quasi-localized wavefunctions u,(x — x;) 
and up(x — xj+1). 


Fig. 2. 27. The tight-binding 
approximation (schematically). 


To describe this effect quantitatively, let us first return to the problem of two coupled wells 
considered in Sec. 6, and recast the result (180), with restored eigenstate index n, as 


¥,(x,0) = [ag Ove) +a, OW, cleo, = , (2.199) 


where the probability amplitudes ap and ay oscillate sinusoidally in time: 


Ap (t) = COS “: t, a, (t) =isin *: t. (2.200) 


This evolution satisfies the following system of two equations whose structure is similar to Eq. (1.61a): 
iha, =—0,a,, iha, =—6,ay. (2.201) 
Eq. (199) may be readily generalized to the case of many similar coupled wells: 


Y= Lao, (x- «| exp -i24| : (2.202) 
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where E,, are the eigenenergies and uw, the eigenfunctions of each well. In the tight-binding limit, only 
the adjacent wells are coupled, so that instead of Eq. (201) we should write an infinite system of similar 
equations 

tha, =-0,4,,—-O,4 5 (2.203) 
for each well number j, where parameters 6, describe the coupling between two adjacent potential wells. 
Repeating the calculation outlined at the end of the last section for our new situation, for a smooth 


potential we may get an expression essentially similar to the last form of Eq. (188): 
Tight- 
binding h- di 
limit 6, =——u,(%,)—"(a—x,), (2.204) 
coupling m X 
energy 
where xo is the distance between the well bottom and the middle of the potential barrier on the right of it 
— see Fig. 27. The only substantial new feature of this expression in comparison with Eq. (188) is that 
the sign of 6, alternates with the level number n: 6; > 0, 6) < 0, 63 > 0, etc. Indeed, the number of zeros 
(and hence, “wiggles”) of the eigenfunctions u,(x) of any potential well increases as n — see, e.g., Fig. 
1.8,°° so that the difference of the exponential tails of the functions, sneaking under the left and right 


barriers limiting the well also alternates with n. 


The infinite system of ordinary differential equations (203) enables solutions of many important 
problems (such as the spread of the wavefunction that was initially localized in one well, etc.), but our 
task right now is just to find its stationary states, i.e. the solutions proportional to exp {-i(é,/h)t}, where 
&, is a still unknown, g-dependent addition to the background energy E,, of the n'" energy level. To 
satisfy the Bloch theorem (193) as well, such a solution should have the following form: 


a,(t)=a exp “ rt zp cons (2.205) 


Plugging this solution into Eq. (203) and canceling the common exponent, we get 

Tight- 
binding 
limit: 
energy 
bands 


E=E,+¢,=E,-6, (_4 + elt) =E_ -26,cosga, (2.206) 


so that in this approximation, the energy band width AE, (see Fig. 26b) equals 416, |. 


The relation (206), whose validity is restricted to | 5, | << E,, describes the lowest energy bands 
plotted in Fig. 26b reasonably well. (For larger £, the agreement would be even better.) So, this 
calculation explains what the energy bands really are: in the tight-binding limit they are best interpreted 
as isolated well’s energy levels E,,, broadened into bands by the interwell interaction. Also, this result 
gives clear proof that the energy band extremes correspond to ga = 2al and ga = 2a(/ + '2), with integer 
/. Finally, the sign alteration of the coupling coefficient 6, (204) explains why the energy maxima of one 
band are aligned, on the ga axis, with energy minima of the adjacent bands — see Fig. 26. 


(11) Weak-potential limit. Amazingly, the energy-band structure is also compatible with a 
completely different physical picture that may be developed in the opposite limit. Let the particle’s 
energy E be so high that the periodic potential U(x) may be treated as a small perturbation. Naively, in 


55 Below, we will see several other examples of this behavior. This alternation rule is also described by the 
Wilson-Sommerfeld quantization condition (110). 
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this limit we could expect a slightly and smoothly deformed parabolic dispersion relation E = f7k?/2m. 
However, if we are plotting the stationary-state energy as a function of g rather than k, we need to add 
2 /a, with an arbitrary integer /, to the argument. Let us show this by expanding all variables into the 
1D-spatial Fourier series. For the potential energy U(x) that obeys Eq. (192), such an expansion is 
straightforward:>° 


U(x)=>U, exp PE, (2.207) 
- 


where the summation is over all integers /”’, from —0o to too. However, for the wavefunction we should 
show due respect to the Bloch theorem (193), which shows that strictly speaking, y(x) is not periodic. 


To overcome this difficulty, let us define another function: 


u(x) =y(xje™, (2.208) 
and study its periodicity: 
u(xt+a)=y(xtaje IO = y(xJe =u(x). (2.209) 


We see that the new function is a-periodic, and hence we can use Eqs. (208)-(209) to rewrite the Bloch 
theorem in a different form: 


y(x)= u(x)e'™ , with u(x+a)=u(x). (2.210) 


Now it is safe to expand the periodic function u(x) exactly as U(x): 
u(x) =) u, exp ir, (2.211) 
7 
so that, according to Eq. (210), 
y(x)=e'* vu, exp|- (zr = Sou, exo a -2£,'\ (2.212) 
7 7 


The only nontrivial part of plugging Eqs. (207) and (212) into the stationary Schrédinger 
equation (53) is how to handle the product term, 


U(xyy =U ptt, exp g = any + it (2.213) 
ry" a 
At fixed /’, we may change the summation over /” to that over /=/’+/” (so that /” =/—1’), and write: 


U(x)y = Dewy(a - 22)\s1y ais (2.214) 
i a 


Now plugging Eq. (212) (with the summation index /’ replaced with /) and Eq. (214) into the stationary 
Schrédinger equation (53), and requiring the coefficients of each spatial exponent to match, we get an 
infinite system of linear equations for w;: 


56 The benefits of such an unusual notation of the summation index (/” instead of, say, /) will be clear in a few 
lines. 
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9 2 
y Uh -|2-E(o-28) fs (2.215) 
7 2m a 


(Note that by this calculation we have essentially proved that the Bloch wavefunction (210) is indeed a 
solution of the Schrédinger equation, provided that the quasimomentum gq is selected in a way to make 
the system of linear equation (215) compatible, i.e. is a solution of its characteristic equation.) 


So far, the system of equations (215) is an equivalent alternative to the initial Schrédinger 
equation, for any potential’s strength.’ In the weak-potential limit, i.e. if all Fourier coefficients U,, are 
small,58 we can complete all the calculations analytically.5° Indeed, in the so-called 0" approximation 
we can ignore all U,,, so that in order to have at least one uw; different from 0, Eq. (215) requires that 


2 2, 
pons" C “a (2.216) 


(u; itself should be obtained from the normalization condition). This result means that in this 
approximation, the dispersion relation E(qg) has an infinite number of similar quadratic branches 
numbered by integer / — see Fig. 28. 


E®------- SX Q_{ A, 
1=0 ESS 1=2 
13813 A= 0 A 
Fig. 2.28. The energy band/gap 
E® -----2 wv. A picture in the weak potential limit (A, 
ms 1 : << E”), with the shading showing the 
1“ Brillouin zone. 
0 1 


qal2n 


On every branch, such eigenfunction has just one Fourier coefficient, i.e. is a monochromatic traveling 
wave 


y, > u,ei =U, expla = 2), : (2.217) 
a 


Next, the above definition of £; allows us to rewrite Eq. (215) in a more transparent form 


57 By the way, the system is very efficient for fast numerical solution of the stationary Schrédinger equation for 
any periodic profile U(x), even though to describe potentials with large U,,, this approach may require taking into 
account a correspondingly large number of Fourier amplitudes u;. 

58 Besides, possibly, a constant potential Up, which, as was discussed in Chapter 1, may be always taken for the 
energy reference. As a result, in the following calculations, I will take Up = 0 to simplify the formulas. 

59 This method is so powerful that its multi-dimensional version is not much more complex than the 1D version 
described here — see, e.g., Sec. 3.2 in the classical textbook by J. Ziman, Principles of the Theory of Solids, 2 
ed., Cambridge U. Press, 1979. 
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YU pu, =(E-E, Ju, (2.218) 
Val 
which may be formally solved for uw: 


1 
u, = U_u,. 2.219 
1 E-E, 2, 1-1 ( ) 


This formula shows that if the Fourier coefficients U,, are non-zero but small, the wavefunctions do 
acquire other Fourier components (besides the main one, with the index corresponding to the branch 
number), but these additions are all small, besides narrow regions near the points £; = E; where two 
branches (216) of the dispersion relation E(qg), with some specific numbers / and /’, cross. According to 


Eq. (216), this happens when 
C a 7 S C a i), (2.220) 
a a 


i.e. at ¢ © dm = mm/a (with the integer m =/ + /’) © corresponding to 


2, 242 
M [ed+1)-20f = 2 
2ma 


r=E™, (2.221) 


eae 


2 
ma 


with integer n = / — 1’. (According to their definitions, the index n is just the number of the branch 
crossing on the energy scale, while the index m numbers the position of the crossing points on the q-axis 
— see Fig. 28.) In such a region, EF has to be close to both E; and E;, so that the denominator in just one 
of the infinite number of terms in Eq. (219) is very small, making the term substantial despite the 
smallness of U,,. Hence we can take into account only one term in each of the sums (written for / and /’): 


U u, =(E-E, )u,, 


(2.222) 
U_,u, =(E-E; uy. 


Taking into account that for any real function U(x), the Fourier coefficients in its Fourier expansion 
(207) have to be related as U_,, = U, , Eq. (222) yields the following simple characteristic equation 


Ban, =U, 
a0. BSE; 


=0, (2.223) 


with the following solution: 


(2.224) 


According to Eq. (216), close to the branch crossing point g, = a(/ + l’V/a, the fraction 
participating in this result may be approximated as®! 
dE, 


E, —Ep re ; 
——— 2/79q, with vy =— 
5 Yq ¥ FP 


i?n = 2aE™ 7 
qq, = =, and F=q-9,, (2.225) 
m ma 7m 


60 Let me hope that the difference between this new integer and the particle’s mass, both called m, is absolutely 
clear from the context. 
61 Physically, wh = h(an/a)/m = hk'”/m is just the velocity of a free classical particle with energy E”. 
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while the parameters E,ye = E” and U,U, = | U,? do not depend on 7, i.e. on the distance from the 
central point g,. This is why Eq. (224) may be plotted as the famous Jevel anticrossing (also called 
“avoided crossing”, or “intended crossing”, or “non-crossing”) diagram (Fig. 29), with the energy gap 
width A, equal to 2|U,|, ie. just twice the magnitude of the n-th Fourier harmonic of the periodic 
potential U(x). Such anticrossings are also clearly visible in Fig. 28, which shows the result of the exact 
solution of Eq. (198) for the particular case B= 0.5.9 


Fig. 2.29. The level anticrossing diagram. 


We will run into the anticrossing diagram again and again in the course, notably at the discussion 
of spin-’2 and other two-level systems. It is also repeatedly met in classical mechanics, for example at 
the calculation of frequencies of coupled oscillators.®*°+ In our current case of the weak potential limit 
of the band theory, the diagram describes the interaction of two traveling de Broglie waves (217), with 
oppositely directed wave vectors, / and —/’ , via the (J — 1’) (i.e. the n") Fourier harmonic of the 
potential profile U(x).® This effect exists also in the classical wave theory and is known as the Bragg 
reflection, describing, for example, the 1D model of the X-wave reflection by a crystal lattice (see, e.g. 
Fig. 1.5) in the limit of weak interaction between the incident wave and each atom. 


The anticrossing diagram shows that rather counter-intuitively, even a weak periodic potential 
changes the topology of the initially parabolic dispersion relation radically, connecting its different 
branches, and thus creating the energy gaps. Let me hope that the reader has enjoyed the elegant 
description of this effect, discussed above, as well as one more illustration of the wonderful ability of 
physics to give completely different interpretations (and different approximate approaches) to the same 
effect in opposite limits. 


So, we have explained analytically (though only in two limits) the particular band structure 
shown in Fig. 26. Now we may wonder how general this structure is, i.e. how much of it is independent 
of the Dirac comb model (Fig. 24). For that, let us represent the band pattern, such as that shown in Fig. 


62 From that figure, it is also clear that in the weak potential limit, the width AE, of the n* energy band is just E” 
— E” ~! — see Eq. (221). Note that this is exactly the distance between the adjacent energy levels of the simplest 
1D potential well of infinite depth — cf. Eq. (1.85). 

63 See, e.g., CM Sec. 6.1 and in particular Fig. 6.2. 

64 Actually, we could readily obtain this diagram in the previous section, for the system of two weakly coupled 
potential wells (Fig. 21), if we assumed the wells to be slightly dissimilar. 

65 In the language of the de Broglie wave scattering, to be discussed in Sec. 3.3, Eq. (220) may be interpreted as 
the condition that each of these waves, scattered on the n™ Fourier harmonic of the potential profile, 
constructively interferes with its counterpart, leading to a strong enhancement of their interaction. 
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26b (plotted for a particular value of the parameter , characterizing the potential barrier strength) in a 
more condensed form, which would allow us to place the results for a range of # values on a single 
comprehensible plot. The way to do this should be clear from Fig. 26b: since the dependence of energy 
on the quasimomentum in each energy band is not too eventful, we may plot just the highest and the 
smallest values of the particle’s energy E = f°k’/2m as functions of 8 = maW/h’ — see Fig. 30, which 


may be obtained from Eq. (198) with ga = 0 and ga = z. 


Fig. 2.30. Characteristic curves of the 
Schrédinger equation for the infinite 
Dirac comb (Fig. 24). 


These plots (in mathematics, commonly called characteristic curves, while in applied physics, 
band-edge diagrams) show, first of all, that at small £ all energy gap widths are equal and proportional 
to this parameter, and hence to W. This feature is in a full agreement with the main conclusion (224) of 


our general analysis of the weak-potential limit, because for the Dirac comb potential (Fig. 24), 


U(x)=W >°5(x- ja+const), (2.226) 
j= 
all Fourier harmonic amplitudes, defined by Eq. (207), are equal by magnitude: | U;| = W/a. As f is 
further increased, the gaps grow and the allowed energy bands shrink, but rather slowly. This is also 
natural, because, as Eq. (79) shows, transparency ¥ of the delta-functional barriers separating the quasi- 
localized states (and hence the coupling parameters 5, « /'” participating in the general tight-binding 
limit’s theory) decrease with W o / very gradually. 
These features may be compared with those for more realistic and relatively simple periodic 


functions U(x), for example the sinusoidal potential U(x) = Acos(2m/a) — see Fig. 3la. For this 
potential, the stationary Schrédinger equation (53) takes the following form: 


mes 2 
ae V+ Acos—y = Ey. (2.227) 
2m dx a 
By introduction of dimensionless variables® 
7X E A 
G = a 2 a= E® é 28 = E® > (2.228) 


66 Note that this definition of is quantitatively different from that for the Dirac comb (226), but in both cases, 
this parameter is proportional to the amplitude of the potential modulation. 


Chapter 2 Page 54 of 76 


Mathieu 
equation 


Essential Graduate Physics QM: Quantum Mechanics 


where E"” is defined by Eq. (221) with n = 1, Eq. (227) is reduced to the canonical form of the well- 


studied Mathieu equation®’ 
2 


OU ice D heme 0. (2.229) 


dé? 


(b) 


U(x) 


Fig. 2.31. Two other simple periodic potential profiles: (a) the sinusoidal (“Mathieu”) potential and 
(b) the Kronig-Penney potential. 


Figure 32 shows the characteristic curves of this equation. We see that now at small £ the first 
energy gap grows much faster than the higher ones: A, oc #”. This feature is in accord with the weak- 
coupling result A; = 2|U;|, which is valid only in the linear approximation in U,, because for the Mathieu 
potential, U; = A(6j41 + 6)-1)/2. Another clearly visible feature is the exponentially fast shrinkage of the 
allowed energy bands at 24> a (in Fig. 32, on the right from the dashed line), i.e. at E < A. It may be 
readily explained by our tight-binding approximation result (206): as soon as the eigenenergy drops 
significantly below the potential maximum Umax = A (see Fig. 31a), the quantum states in the adjacent 
potential wells are connected only by tunneling through relatively high potential barriers separating 
these wells, so that the coupling amplitudes 6,, become exponentially small — see, e.g., Eq. (189). 


Fig. 2.32. Characteristic curves of the 
Mathieu equation. The dashed line 
corresponds to the equality a = 2, 1.e. E 
= A = Unax, separating the regions of 
under-barrier tunneling and over-barrier 
motion. Adapted from Fig. 28.2.1 at 
http://dlmf-nist.gov. (Contribution by US 
Government, not subject to copyright). 


Another simple periodic profile is the Kronig-Penney potential, shown in Fig. 31b, which gives 
relatively simple analytical expressions for the characteristic curves. Its advantage is a more realistic law 
of the decrease of the Fourier harmonics U; at / >> 1, and hence of the energy gaps in the weak-potential 
limit: 


67 This equation, first studied in the 1860s by E. Mathieu in the context of a rather practical problem of vibrating 
elliptical drumheads (!), has many other important applications in physics and engineering, notably including the 
parametric excitation of oscillations — see, e.g., CM Sec. 5.5. 
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A, *2|U, 


U 
ac" atho EE” ssuU,. (2.230) 
nN 


Leaving a detailed analysis of the Kronig-Penney potential for the reader’s exercise, let me 
conclude this section by addressing the effect of potential modulation on the number of eigenstates in 
1D systems of a large but finite length / >> a, k'. Surprisingly, the Bloch theorem makes the analysis of 
this problem elementary, for arbitrary U(x). Indeed, let us assume that / is comprised of an integer 
number of periods a, and its ends are described by similar boundary conditions — both assumptions 
evidently inconsequential for / >> a. Then, according to Eq. (210), the boundary conditions impose, on 
the quasimomentum q, exactly the same quantization condition as we had for & for a free 1D motion. 
Hence, instead of Eq. (1.100), we can write 

l 


dN =—dq, (2.231) 
20 


with the corresponding change of the summation rule: 
' 
HLM > —I f@ak. (2.232) 
5 20 


As a result, the density of states in the 1D g-space, dN/dq = 1/22, does not depend on the 
potential profile at all! Note, however, that the profile does affect the density of states on the energy 
scale, dN/dE. As an extreme example, on the bottom and at the top of each energy band we have dE/dq 


— 0, and hence 
dN _ dN /dE _ l /dE = (2.233) 
dE dq/ dq 2a/ dq 


This effect of state concentration at the band/gap edges (which survives in higher spatial 
dimensionalities as well) has important implications for the operation of several important electronic 
and optical devices, in particular semiconductor lasers and light-emitting diodes. 


2.8. Periodic systems: Particle dynamics 
The band structure of the energy spectrum of a particle moving in a periodic potential has 
profound implications not only for its density of states but also for its dynamics. Indeed, let us consider 
the simplest case of a wave packet composed of the Bloch functions (210), all belonging to the same 
(say, n'") energy band. Similarly to Eq. (27) for a free particle, we can describe such a packet as 


Viet = [a,u, (yell 7 aig : (2.234) 


where the a-periodic functions u(x), defined by Eq. (208), are now indexed to emphasize their 
dependence on the quasimomentum, and aq) = E,(q)/h is the function of qg describing the shape of the 
corresponding energy band — see, e.g., Fig. 26b or Fig. 28. If the packet is narrow in the g-space, i.e. if 
the width oq of the distribution a, is much smaller than all the characteristic g-scales of the dispersion 
relation aq), in particular of z/a, we may simplify Eq. (234) exactly as it was done in Sec. 2 for a free 
particle, despite the presence of the periodic factors u,(x) under the integral. In the linear approximation 
of the Taylor expansion, we get a full analog of Eq. (32), but now with q rather than k, and 
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(2.235) 


where qo is the central point of the quasimomentum distribution. Despite the formal similarity with Eqs. 
(33) for the free particle, this result is much more eventful. For example, as evident from the dispersion 
relation’s topology (see Figs. 26b, 28), the group velocity vanishes not only at g = 0, but at all values of 
q that are multiples of (7/a), at the bottom and on the top of each energy band. Even more intriguing is 
that the group velocity’s sign changes periodically with q. 


This group velocity alternation leads to fascinating, counter-intuitive phenomena if a particle 
placed in a periodic potential is the subject of an additional external force F(t). (For electrons in a 
crystal, this may be, for example, the force of the applied electric field.) Let the force be relatively weak, 
so that the product Fa (i.e. the scale of the energy increment from the additional force per one lattice 
period) is much smaller than both relevant energy scales of the dispersion relation E(q) — see Fig. 26b: 


Fa << AE,,A,. (2.236) 


n? 


This strong relationship allows one to neglect the force-induced interband transitions, so that the wave 
packet (234) includes the Bloch eigenfunctions belonging to only one (initial) energy band at all times. 
For the time evolution of its center go, theory yields®’ an extremely simple equation of motion 


qo = ~ F(t) : (2.237) 
This equation is physically very transparent: it is essentially the 2"’ Newton law for the time evolution 
of the quasimomentum fig under the effect of the additional force F(t) only, excluding the periodic force 
—0U(x)/Ox of the background potential U(x). This is very natural, because as Eq. (210) implies, fq is 
essentially the particle’s momentum averaged over the potential’s period, and the periodic force effect 
drops out at such an averaging. 


Despite the simplicity of Eq. (237), the results of its solution may be highly nontrivial. First, let 
us use Eqs. (235) and (237) to find the instant group acceleration of the particle (i.e. the acceleration of 
its wave packet’s envelope): 

_ We = d do(qo) d do(qo) dq, = dO Gy) dq ms! do 


F(0). (2.238) 


a 2 - 2 
dt dt dq dq, dq, at dq, dt hdq 


9-{ 
This means that the second derivative of the dispersion aq) relation (specific for each energy band) 
plays the role of the effective reciprocal mass of the particle at this particular value of qo: 


i 
*  @oldg d’E,/dq? 


(2.239) 


For the particular case of a free particle, for which Eq. (216) is exact, this expression is reduced to the 
original (and constant) mass m, but generally, the effective mass depends on the wave packet’s 


68 The proof of Eq. (237) is not difficult, but becomes more compact in the bra-ket formalism, to be discussed in 
Chapter 4. This is why I recommend to the reader its proof as an exercise after reading that chapter. For a 
generalization of this theory to the case of essential interband transitions see, e.g., Sec. 55 in E. Lifshitz and L. 
Pitaevskii, Statistical Physics, Part 2, Pergamon,1980. 
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momentum. According to Eq. (239), at the bottom of any energy band, mer is always positive but 
depends on the strength of the particle’s interaction with the periodic potential. In particular, according 
to Eq. (206), in the tight-binding limit, the effective mass is very large: 

hi? EY 
q-(tla)n ~ 45 men 


>>m . (2.240) 


|i. 


On the contrary, in the weak-potential limit, the effective mass is close to m at most points of each 
energy band, but at the edges of the (narrow) bandgaps, it is much smaller. Indeed, expanding Eq. (224) 
in the Taylor series near point g = qm, we get 


E E.. zitlU 


ExE™ ~~ “ave | n 


+ 


2 

FE 2 

Fema (cee 7 =+U,|+ 7, (2.241) 
U,|\ da Jang 2|U, 


where y and g are defined by Eq. (225), so that 


h VU, 


a as "i oe 


<<m. (2.242) 


The effective mass effects in real atomic crystals may be very significant. For example, the 
charge carriers in silicon have mer ~ 0.19 me- in the lowest, normally-empty energy band (traditionally 
called the conduction band), and mer = 0.98 m, in the adjacent lower, normally-filled valence band. In 
some semiconducting compounds, the conduction-band mass may be even smaller — down to 0.0145 m, 
in InSb! 


However, the effective mass’ magnitude is not the most surprising effect. A more fascinating 
corollary of Eq. (239) is that on the top of each energy band the effective mass is negative — please 
revisit Figs. 26b, 28, and 29 again. This means that the particle (or more strictly, its wave packet’s 
envelope) is accelerated in the direction opposite to the applied force. This is exactly what electronic 
engineers, working with electrons in semiconductors, call holes, characterizing them by a positive mass 
|me|, but compensating this sign change by taking their charge e positive. If the particle stays in close 
vicinity of the energy band’s top (say, due to frequent scattering effects, typical for the semiconductors 
used in engineering practice), such double sign flip does not lead to an error in calculations of hole’s 
dynamics, because the electric field’s force is proportional to the particle’s charge, so that the particle’s 
acceleration dg, 1s proportional to the charge-to-mass ratio. 


However, in some phenomena such simple representation is unacceptable.’° For example, let us 
form a narrow wave packet at the bottom of the lowest energy band,’! and then exert on it a constant 
force F > 0 — say, due to a constant external electric field directed along the x-axis. According to Eq. 
(237), this force would lead to linear growth of qo in time, so that in the quasimomentum space, the 


69 More discussion of this issue may be found in SM Sec. 6.4. 

70 The balance of this section describes effects that are not discussed in most quantum mechanics textbooks. 
Though, in my opinion, every educated physicist should be aware of them, some readers may skip them at the 
first reading, jumping directly to the next Sec. 9. 

7! Physical intuition tells us (and the theory of open systems, to be discussed in Chapter 7, confirms) that this may 
be readily done, for example, by weakly coupling the system to a relatively low-temperature environment, and 
letting it relax to the lowest possible energy. 
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packet’s center would slide, with a constant speed, along the g axis — see Fig. 33a. Close to the energy 
band’s bottom, this motion would correspond to a positive effective mass (possibly, somewhat different 
than the genuine particle’s mass m), and hence be close to the free particle’s acceleration. However, as 
soon as go has reached the inflection point where dE, /dq? = 0, the effective mass, and hence its 
acceleration (238) change signs to negative, i.e. the packet starts to slow down (in the direct space), 
while still moving ahead with the same velocity in the quasimomentum space. Finally, at the energy 
band’s top, the particle stops at a certain Xmax, while continuing to move forward in the g-space. 


AX a, = AE, / F 


max 


Fig. 2.33. The Bloch oscillations (red lines) and the Landau-Zener tunneling (blue arrows) 
represented in: (a) the reciprocal space of g, and (b) the direct space. On panel (b), the tilted gray 
strips show the allowed energy bands, while the bold red lines, the Wannier-Stark ladder’s steps. 


Now we have two alternative ways to look at the further time evolution of the wave packet along 
the quasimomentum’s axis. From the extended zone picture (which is the simplest for this analysis, see 
Fig. 33a),72 we may say that the particle crosses the 1“ Brillouin zone boundary and continues to go 
forward in qg-space, i.e. down the lowest energy band. According to Eq. (235), this region (up to the next 
energy minimum at ga = 27) corresponds to a negative group velocity. After go has reached that 
minimum, the whole process repeats again — and again, and again. 


These are the famous Bloch oscillations — the effect which had been predicted, by the same F. 
Bloch, as early as 1929 but evaded experimental observation until the 1980s (see below) due to the 
strong scattering effects in real solid-state crystals. The time period of the oscillations may be readily 
found from Eq. (237): 
Aq 2a/a_ 2ah 
Ate = = = 5 
dq/dt F/h Fa 


(2.243) 


so that their frequency may be expressed by a very simple formula 


7 This phenomenon may be also discussed from the point of view of the reduced zone picture, but then it requires 
the introduction of instant jumps between the Brillouin zone boundary points (see the dashed red line in Fig. 33) 
that correspond to physically equivalent states of the particle. Evidently, for the description of this particular 
phenomenon, this language is more artificial. 


Chapter 2 Page 59 of 76 


Essential Graduate Physics QM: Quantum Mechanics 


(2.244) 


and hence is independent of any peculiarities of the energy band/gap structure. 


The direct-space motion of the wave packet’s center xo(¢) during the Bloch oscillation process 
may be analyzed by integrating the first of Eqs. (235) over some time interval At, and using Eq. (237): 


At At t=At 
daqo) ,, _ f4(4o) 
Ax, (t) = | v,,dt = 7 ae =— o= Tro 2.245 
If the interval At is equal to the Bloch oscillation period Atg (243), the initial and final values of E(qo) = 
h@qo) are equal, giving Axo = 0: in the end of the period, the wave packet returns to its initial position in 


space. However, if we carry out this integration only from the smallest to the largest values of @(qo), 1.e. 
the adjacent points where the group velocity vanishes, we get the following Bloch oscillation swing: 


(2.246) 


This simple result may be interpreted using an alternative energy diagram (Fig. 33b), which 
results from the following arguments. The additional force F may be described not only via the 2" 
Newton law’s version (237), but, alternatively, by its contribution —Fx to the Gibbs potential energy” 


U, (x) =U(x) - Fx (2.247) 


The exact solution of the Schrédinger equation (61) with such a potential may be hard to find directly, 
but if the force F is sufficiently weak, as we are assuming throughout this discussion, the second term in 
Eq. (247) may be considered as a constant on the scale of a << Axmax. In this case, our quantum- 
mechanical treatment of the periodic potential U(x) is still virtually correct, but with an energy shift 
depending on the “global” position xo of the packet’s center. In this approximation, the total energy of 
the wave packet is 

E, = E(q,)) — Fx. (2.248) 


In a plot of such energy as a function of xo (Fig. 33b), the energy dependence on qo is hidden, but 
as was discussed above, it is rather uneventful and may be well characterized by the position of band- 
gap edges on the energy axis.” In this representation, the Bloch oscillations keep the full energy Ey of 
the particle constant, i.e. follow a horizontal line in Fig. 33b, limited by the classical turning points 
corresponding to the bottom and the top of the allowed energy band. The distance Avmax between these 
points is evidently given by Eq. (246). 


Besides this alternative look at the Bloch oscillation swing, the total energy diagram shown in 
Fig. 33b enables one more remarkable result. Let a wave packet be so narrow in the momentum space 


73 Physically, this is just the relevant part of the potential energy of the total system comprised of our particle (in 
the periodic potential) and the source of the force F' — see, e.g., CM Sec. 1.4. 

7 In semiconductor physics and engineering, such spatial band-edge diagrams are virtually unavoidable 
components of almost every discussion/publication. In this series, a few more examples of such diagrams may be 
found in SM Sec. 6.4. 
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that dx ~ 1/dg >> Axmax; then it may be well represented by definite energy, i.e. by a horizontal line in 
Fig. 33b. But Eq. (247) is exactly invariant with respect to the following simultaneous translation of the 
coordinate and the energy: 


x7>xt+a, EOE-Fa. (2.249) 


This means that it is satisfied by an infinite set of similar solutions, each corresponding to one of the 
horizontal red lines shown in Fig. 33b. This is the famous Wannier-Stark ladder,’> with the step height 


The importance of this alternative representation of the Bloch oscillations is due to the following 
fact. In most experimental realizations, the power of the electromagnetic radiation with frequency (244), 
that may be extracted from the oscillations of a charged particle, is very low, so that their direct 
detection represents a hard problem.’ However, let us apply to a Bloch oscillator an additional ac field 
at frequency @ ~ @ ,. As these frequencies are brought close together, the external signal should 
synchronize (“phase-lock”) the Bloch oscillations,’’ resulting in certain changes of time-independent 
observables — for example, a resonant change of absorption of the external radiation. Now let us notice 
that the combination of Eqs. (244) and (250) yield the following simple relation: 


AE ws =hO,. (2.251) 


This means that the phase-locking at @ = @ , allows for an alternative (but equivalent) interpretation — as 
the result of ac-field-induced quantum transitions7® between the steps of the Wannier-Stark ladder. 
(Again, such occasions when two very different languages may be used for alternative interpretations of 
the same effect is one of the most beautiful features of physics.) 


This phase-locking effect has been used for the first experimental confirmations of the Bloch 
oscillation theory.7? For this purpose, the natural periodic structures, solid-state crystals, are 
inconvenient due to their very small period a ~ 107° m. Indeed, according to Eq. (244), such structures 
require very high forces F (and hence very high electric fields €= F/e) to bring @g to an experimentally 
convenient range. This problem has been overcome using artificial periodic structures (superlattices) of 
certain semiconductor compounds, such as Ga;..Al,As with various degrees x of the gallium-to- 
aluminum atom replacement, whose layers may be grown over each other epitaxially, i.e., with very few 
crystal structure violations. Such superlattices, with periods a ~ 10 nm, have enabled a clear observation 
of the resonance at @~ @p, and hence a measurement of the Bloch oscillation frequency, in particular its 
proportionality to the applied dc electric field, predicted by Eq. (244). 


7 This effect was first discussed in detail by Gregory Hugh Wannier in his 1959 monograph on solid-state 
physics, while the name of Johannes Stark is traditionally associated with virtually any electric field effect on 
atomic systems, after he had discovered the first of such effects in 1913. 

76 In systems with many independent particles (such as electrons in semiconductors), the detection problem is 
exacerbated by the phase incoherence of the Bloch oscillations performed by each particle. This drawback is 
absent in atomic Bose-Einstein condensates whose Bloch oscillations (in a periodic potential created by standing 
optical waves) were eventually observed by M. Ben Dahan et al., Phys. Rev. Lett. 76, 4508 (1996). 

77 A simple analysis of phase locking of a classical oscillator may be found, e.g., in CM Sec. 5.4. (See also the 
brief discussion of the phase locking of the Josephson oscillations at the end of Sec. 1.6 of this course.) 

78 A quantitative theory of such transitions will be discussed in Sec. 6.6 and then in Chapter 7. 

79 E. Mendez et al., Phys. Lev. Lett. 60, 2426 (1988). 
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Very soon after this discovery, the Bloch oscillations were observed®? in small Josephson 
junctions, where they result from the quantum dynamics of the Josephson phase difference g in a 27- 
periodic potential profile, created by the junction. A straightforward translation of Eq. (244) to this case 
(left for the reader’s exercise) shows that the frequency of such Bloch oscillations is 


0, => Le. des = On = (2.252) 


where / is the de current passed through the junction — the effect not to be confused with the “classical” 
Josephson oscillations with frequency (1.75). It is curious that Eq. (252) may be legitimately interpreted 
as a result of a periodic transfer, through the Josephson junction, of discrete Cooper pairs (of charge — 
2e), between two coherent Bose-Einstein condensates in the superconducting electrodes of the 
junction.®! 


Next, our discussion of the Bloch oscillations was based on the premise that the wave packet of 
the particle stays within one (say, the lowest) energy band. However, just one look at Fig. 28 shows that 
this assumption becomes unrealistic if the energy gap separating this band from the next one becomes 
very small, A; —> 0. Indeed, in the weak-potential approximation, which is adequate in this limit, | U;| 
— 0, the two dispersion curve branches (216) cross without any interaction, so that if our particle 
(meaning its the wave packet) is driven to approach that point, it should continue to move up in energy — 
see the dashed blue arrow in Fig. 33a. Similarly, in the real-space representation shown in Fig. 33b, it is 
intuitively clear that at A; — 0, the particle residing at one of the steps of the Wannier-Stark ladder 
should be able to somehow overcome the vanishing spatial gap Axo = Aj/F and to “leak” into the next 
band — see the horizontal dashed blue arrow on that panel. 


This process, called the Landau-Zener (or “interband”, or “band-to-band”’) tunneling,®2 is indeed 
possible. To analyze it, let us first take F = 0, and consider what happens if a quantum particle, 
described by an x-long (i.e. E-narrow) wave packet, is incident from free space upon a periodic structure 
of a large but finite length / = Na >> a — see, e.g., Fig. 22. If the packet’s energy E is within one of the 
energy bands, it may evidently propagate through the structure (though may be partly reflected from its 
ends). The corresponding quasimomentum may be found by solving the dispersion relation for g; for 
example, in the weak-potential limit, Eq. (224) (which is valid near the gap) yields 


g=4,+9, with g=41[e?-juP]°, forlu<B, 2.253) 
y 


where E = E, — E and y= 2aE/an — see the second of Eqs. (225). 


Now, if the energy E is inside one of the energy gaps A,, the wave packet’s propagation in an 
infinite periodic lattice is impossible, so that it is completely reflected from it. However, our analysis of 
the potential step problem in Sec. 3 implies that the packet’s wavefunction should still have an 
exponential tail protruding into the structure and decaying on some length 6— see Eq. (58) and Fig. 2.4. 


80D. Haviland et al., Z. Phys. B 85, 339 (1991). 

81 See, e.g., D. Averin et al., Sov. Phys. — JETP 61, 407 (1985). This effect is qualitatively similar to the transfer 
of single electrons, with a similar frequency f= //e, in tunnel junctions between normal (non-superconducting) 
metals — see, e.g., EM Sec. 2.9 and references therein. 

82 It was predicted independently by L. Landau, C. Zener, E. Stueckelberg, and E. Majorana in 1932. 
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Indeed, a straightforward review of the calculations leading to Eq. (253) shows that it remains valid for 
energies within the gap as well, if the quasimomentum is understood as a purely imaginary number: 


2 


~ ; 1 2 2 1/2 =) 

q > tik, where tk =— lu,,| —E : for E <|U, (2.254) 
Y 

With this replacement, the Bloch solution (193b) indeed describes an exponential decay of the 

wavefunction at length 6~ 1/x. 


Returning to the effects of weak force F, in the real-space approach described by Eq. (248) and 
illustrated in Fig. 33b, we may recast Eq. (254) as 


1/2 


pe ee lu, ok (Fx) | (2.255) 
y 


where x is the particle’s (i.e. its wave packet center’s) deviation from the mid-gap point. Thus the gap 
creates a potential barrier of a finite width Axo = 2! U,|/F, through which the wave packet may tunnel 
with a non-zero probability. As we already know, in the WKB approximation (in our case requiring 
xKAx >> 1) this probability is just the potential barrier’s transparency Y, which may be calculated from 
Eq. (117): 


Gy | ee ax, f 22) 2a. (2.256) 


-Inf =2 [ «(dx == | liu, 
K(x)?>0 Yt, 


where +x, = +Axo/2 = +| U,| /F are the classical turning points. Working out this simple integral (or just 
noticing that it is a quarter of the unit circle’s area, and hence is equal to 7/4), we get 


(2.257) 


This famous result may be also obtained in a more complex way, whose advantage is a 
constructive proof that Eq. (257) is valid for an arbitrary relation between 7F and | Ul, ie. arbitrary 7, 
while our simple derivation was limited to the WKB approximation, valid only at 7 << 1.83 Using Eq. 
(225), we may rewrite the product 7F participating in Eq. (257), as 


=”. (2.258) 


1 ae —E,) 
E,=E,=E™ = 2 


F=— 
e 2 dq, 


h dqy _h a -E,) 


dt 2| dt 


E,=E,=E 


where u has the meaning of the “speed” of the energy level crossing in the absence of the gap. Hence, 
Eq. (257) may be rewritten in the form 
| 2nlU, 
SF = exp, —- ———_ 


= . (2.259) 


which is more transparent physically. Indeed, the fraction 2 U, | /u = Anu gives the time scale At of the 
energy’s crossing the gap region, and according to the Fourier transform, its reciprocal, @max ~ 1/At 


83 In Chapter 6 below, Eq. (257) will be derived using a different method, based on the so-called Golden Rule of 
quantum mechanics, but also in the weak-potential limit, i.e. for hyperbolic dispersion law (253). 


Chapter 2 Page 63 of 76 


Essential Graduate Physics QM: Quantum Mechanics 


gives the upper cutoff of the frequencies essentially involved in the Bloch oscillation process. Hence Eq. 
(259) means that 


A 
—-InF »—. (2.260) 


This formula allows us to interpret the Landau-Zener tunneling as the system’s excitation across the 
energy gap A, by the highest-energy quantum i@max available from the Bloch oscillation process. This 
interpretation remains valid even in the opposite, tight-binding limit, in which, according to Eqs. (206) 
and (237), the Bloch oscillations are purely sinusoidal, so that the Landau-Zener tunneling is completely 
suppressed at ap < Aj. 


Interband tunneling is an important ingredient of several physical phenomena and even some 
practical electron devices, for example, the tunneling (or “Esaki’) diodes. This simple device is just a 
junction of two semiconductor electrodes, one of them so strongly n-doped by electron donors that some 
electrons form a degenerate Fermi gas at the bottom of the conduction band. 84 Similarly, the counterpart 
semiconductor electrode is p-doped so strongly that the Fermi level in the valence band is shifted below 
the band edge — see Fig. 34. 


(a) () 7 (c) 


Fig. 2.34. The tunneling (“Esaki”) diode: (a) the band-edge diagram of the device at zero bias; 
(b) the same diagram at a modest positive bias eV ~ A/2, and (c) the /-V curve of the device 
(schematically). Dashed lines on panels (a) and (b) show the Fermi level positions. 


In thermal equilibrium, and in the absence of external voltage bias, the Fermi levels of the two 
electrodes self-align, leading to the build-up of the contact potential difference @/e, with ¢ a bit larger 
than the energy bandgap A — see Fig. 34a. This potential difference creates an internal electric field that 
tilts the energy bands (just as the external field did in Fig. 33b), and leads to the formation of the so- 
called depletion layer, in which the Fermi level is located within the energy gap and hence there are no 
charge carriers ready to move. In the usual p-n junctions, this layer is broad and prevents any current at 
applied voltages V lower than ~A/e. In contrast, in a tunneling diode the depletion layer is so thin (below 
~10 nm) that the interband tunneling is possible and provides a substantial Ohmic current at small 
applied voltages — see Fig. 34c. However, at larger positive biases, with eV ~ A/2, the conduction band 
is aligned with the middle of the energy gap in the p-doped electrode, and electrons cannot tunnel there. 
Similarly, there are no electrons in the n-doped semiconductor to tunnel into the available states just 
above the Fermi level in the p-doped electrode — see Fig. 34b. As a result, at such voltages the current 


84 Here I have to rely on the reader’s background knowledge of basic semiconductor physics; it will be discussed 
in more detail in SM Sec. 6.4. 
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drops significantly, to grow again only when eV exceeds ~A, enabling electron motion within each 
energy band. Thus the tunnel junction’s /-V curve has a part with a negative differential resistance 
(dV/dI < 0) — see Fig. 34c. This phenomenon, equivalent in its effect to negative kinematic friction in 
mechanics, may be used for amplification of weak analog signals, for self-excitation of electronic 
oscillators®> (i.e. an ac signal generation), and for signal swing restoration in digital electronics. 


2.9. Harmonic oscillator: Brute force approach 
To complete our review of the basic 1D wave mechanics, we have to consider the famous 
harmonic oscillator, i.e. a 1D particle moving in the quadratic-parabolic potential (111), so that the 
stationary Schrédinger equation (53) is 
h? d’y ma,x° 
2m dx? 2 


Conceptually, on the background of the fascinating quantum effects discussed in the previous sections, 
this is not a very interesting system: Eq. (261) is a standard 1D eigenproblem, resulting in a discrete 
energy spectrum £,, with smooth eigenfunctions y,(x) vanishing at x — +co (because the potential 
energy tends to infinity there).8° However, as we will repeatedly see later in the course, this problem’s 
solutions have an enormous range of applications, so we have to know their basic properties. 


y =Ey. (2.261) 


The direct analytical solution of the problem is not very simple (see below), so let us start by 
trying some indirect approaches to it. First, as was discussed in Sec. 4, the WKB-approximation-based 
Wilson-Sommerfeld quantization rule (110), applied to this potential, yields the eigenenergy spectrum 
(114). With the common quantum number convention, this result is 


E, =han| n+ with n=0,1,2,..., (2.262) 


so that (in contrast to the 1D rectangular potential well) the ground-state energy corresponds to n = 0. 
However, as was discussed in the end of Sec. 4, for the quadratic potential (111) the WKB 
approximation’s conditions are strictly satisfied only at E,, >> hao, so that so far we can only trust Eq. 
(262) for high levels, with n >> 1, rather than for the (most important) ground state. 


This is why let me use Eq. (261) to demonstrate another approximate approach, called the 
variational method, whose simplest form is aimed at finding ground states. The method is based on the 
following observation. (Here I am presenting its 1D wave mechanics form, though the method is much 
more general.) Let y;, be the exact, full, and orthonormal set of stationary wavefunctions of the system 
under study, and E,, the set of the corresponding energy levels, satisfying Eq. (1.60): 


Hy, a EY, . (2.263) 


Then we may use this set for the unique expansion of an arbitrary trial wavefunction Weiai : 


85 See, e.g., CM Sec. 5.4. 

86 The stationary state of the harmonic oscillator (which, as will be discussed in Secs. 5.4 and 7.1, may be 
considered as the state with a definite number of identical bosonic excitations) are sometimes called Fock states — 
after Vladimir Aleksandrovich Fock. (This term is also used in a more general sense, for definite-particle-number 
states of systems with indistinguishable bosons of any kind — see Sec. 8.3.) 
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Vein => WV,»  sothat Win = DOV, . (2.264) 


where q@, are some (generally, complex) coefficients. Let us require the trial function to be normalized, 
using the condition (1.66) of orthonormality of the eigenfunctions y;: 


[VY ward X= DY Vd x= YOO [VW x = Vy, yy =, =1, (2.265) 


nn nn n,n' 


where each of the coefficients W,,, defined as 


W,=a,a,=|a,| 20, (2.266) 


n 


may be interpreted as the probability for the particle, in the n" trial state, to be found in the n" genuine 
stationary state. Now let us use Eq. (1.23) for a similar calculation of the expectation value of the 
system’s Hamiltonian in the trial state:8’ 


_ x OA 3 * ORLA 3 * * 3 
(H) = [VcaTV vind x= wie. VY, Ha,W,d x= Vi a,.a, Ey |v, Yd x 
n,n' n,n' 


. (2.267) 
= 4,0, Ey Oyu = WE, 


n,n' 


Since the exact ground state energy E, is, by definition, the lowest one of the set E,, i.e. E, 2 Ey, Eqs. 
(265) and (267) yield the following inequality: 


ech ae 2 Lee = EW, = E,. (2.268) 


Thus, the genuine ground state energy of the system is always lower than (or equal to) its energy 
in any trial state. Hence, if we make several attempts with reasonably selected trial wavefunctions, we 
may expect the lowest of the results to approximate the genuine ground state energy reasonably well. 
Even more conveniently, if we select some reasonable class of trial wavefunctions dependent on a free 
parameter 1, then we may use the necessary condition of the minimum of (A)trial, 


o(H) trial 
OA 


to find the closest of them to the genuine ground state. Even better results may be obtained using trial 
wavefunctions dependent on several parameters. Note, however, that the variational method does not tell 
us how exactly the trial function should be selected, or how close its final result is to the genuine 
ground-state function. In this sense, this method has “uncontrollable accuracy”, and differs from both 
the WKB approximation and the perturbation methods (to be discussed in Chapter 6), for which we have 
certain accuracy criteria. Because of this drawback, the variational method is typically used as the last 
resort — though sometimes (as in the example that follows) it works remarkably well.88 


=0, (2.269) 


87 It is easy (and hence left for the reader) to show that the uncertainty 6H in any state of a Hamiltonian system, 
including the trial state (264), vanishes, so that the (),ia; may be interpreted as the definite energy of the state. 
For our current goals, however, this fact is not important. 

88 The variational method may be used also to estimate the first excited state (or even a few lowest excited states) 
of the system, by requiring the new trial function to be orthogonal to the previously calculated eigenfunctions of 
the lower-energy states. However, the method’s error typically grows with the state number. 
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Let us apply this method to the harmonic oscillator. Since the potential (111) is symmetric with 
respect to point x = 0, and continuous at all points (so that, according to Eq. (261), d°y/dx’ has to be 
continuous as well), the most natural selection of the ground-state trial function is the Gaussian function 


V cia (X)= Cexpt- Ax’ }, (2.270) 


with some real 4 > 0. The normalization coefficient C may be immediately found either from the 
standard Gaussian integration of | Yial”, or just from the comparison of this expression with Eq. (16), in 
which A = 1/(26x)”, ie. dx = 1/2'”, giving | C |’ = (24/2)'”. Now the expectation value of the particle’s 
Hamiltonian, 


a2 2 2 222 
Peja (2.271) 
2m 2m dx 2 
in the trial state, may be calculated as 
oe (Ah? d? — mayx 
(H Fa = Jy trial [- nae + 7 V sigh AX 
2a)" | Wat a ners 
= fexpt Dax \ be fe lc ee le [x exp 2Ax? \ be 
1 mM 5 2 m jy 
Both involved integrals are of the same well-known Gaussian type,®? giving 
2 2 
ii) = Ae (2.273) 
trial 2m 8A 


As a function of A, this expression has a single minimum at the value A), that may be found from the 
requirement (269), giving Appt = ma@/2h. The resulting minimum of (A)rrial is exactly equal to ground- 
state energy following from Eq. (262), 


(2.274) 
Such a coincidence of results of the WKB approximation and the variational method is rather 


unusual, and implies (though does not strictly prove) that Eq. (274) is exact. As a minimum, this 
coincidence gives a strong motivation to plug the trial wavefunction (270), with A= Appt, Le. 


(2.275) 


and the energy (274), into the Schrédinger equation (261). Such substitution® shows that the equation is 
indeed exactly satisfied. 


According to Eq. (275), the characteristic scale of the wavefunction’s spatial spread?! is 


89 See, e.g., MA Eqs. (6.9b) and (6.9c). 
0 Actually, this is a twist of one of the tasks of Problem 1.12. 
91 Quantitatively, as was already mentioned in Sec. 2.1, xo = V2dx = (2x°)!”, 
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(2.276) 


Due to the importance of this scale, let us give its crude estimates for several representative systems:9? 


(i) For atom-bound electrons in solids and fluids, m ~ 10°’ kg, and @ ~ 10's", giving xo ~ 0.3 
nm, of the order of the typical inter-atomic distances in condensed matter. As a result, classical 
mechanics is not valid at all for the analysis of their motion. 


(ii) For atoms in solids, m = 107*-107° kg, and a ~ 10’? s", giving xo ~ 0.01 — 0.1 nm, ie. 
somewhat smaller than inter-atomic distances. Because of that, the methods based on classical 
mechanics (e.g., molecular dynamics) are approximately valid for the analysis of atomic motion, though 
they may miss some fine effects exhibited by lighter atoms — e.g., the so-called quantum diffusion of 
hydrogen atoms, due to their tunneling through the energy barriers of the potential profile created by 
other atoms. 


(iii) Recently, the progress of patterning technologies has enabled the fabrication of high-quality 
micromechanical oscillators consisting of zillions of atoms. For example, the oscillator used in one of 
the pioneering experiments in this field?3 was a ~1-ym thick membrane with a 60-um diameter, and had 
m ~ 2x104 kg and @ ~ 3x10!° s| so that xy ~ 4x107'° m. It is remarkable that despite such extreme 
smallness of x9 (much smaller than not only any atom but even any atomic nucleus!), quantum states of 
such oscillators may be manipulated and measured, using their coupling to electromagnetic (in 
particular, optical) resonant cavities.%4 


Returning to the Schrédinger equation (261), in order to analyze its higher eigenstates, we will 
need more help from mathematics. Let us recast this equation into a dimensionless form by introducing 
the dimensionless variable € = x/xo. This gives 

2 


som 

qe 

where €¢= 2E/ha@ = E/Ep. In this notation, the ground-state wavefunction (275) is proportional to exp {- 
&/2}. Using this clue, let us look for solutions to Eq. (277) in the form 


e 
2 


+€’y =ey, (2.277) 


y= Cex we, (2.278) 


where H(é) is a new function, and C is the normalization constant. With this substitution, Eq. (277) 


yields 


d°H dH 
iE 2E iE +(e-lH =0. (2.279) 


92 By order of magnitude, such estimates are valid even for the systems whose dynamics is substantially different 
from that of harmonic oscillators, if a typical frequency of quantum transitions is taken for @. 

93 A. O’Connell et al., Nature 464, 697 (2010). 

94 See a review of such experiments by M. Aspelmeyer et al., Rev. Mod. Phys. 86, 1391 (2014), and also recent 
experiments with nanoparticles placed in much “softer” potential wells — e.g., by U. Delié et al., Science 367, 892 
(2020). 
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It is evident that H = const and ¢ = | is one of its solutions, describing the ground-state 
eigenfunction (275) and energy (274), but what are the other eigenstates and eigenvalues? Fortunately, 
the linear differential equation (274) was studied in detail in the mid-1800s by C. Hermite who has 
shown that all its eigenvalues are given by the set 


é,-l=2n, with n=0,1,2,..., (2.280) 


so that Eq. (262) is indeed exact for any n.° The eigenfunction of Eq. (279), corresponding to the 
eigenvalue &,, 1S a polynomial (called the Hermite polynomial) of degree n, which may be most 
conveniently calculated using the following explicit formula: 


d” ; 
a expt <7}. (2.281) 


It is easy to use this formula to spell out several lowest-degree polynomials — see Fig. 35a: 


H, =(-1)' exp} 


H,=1, H,=2¢, H,=4€?-2, H,=86°-12¢, H,=16€* -48€?+12,... (2.282) 


oe 0 3 
- S 
x YW; f(b) 
\ Yo ; ; 
Fig. 2.35. (a) A few lowest Hermite 
FE ee Oe eee te / as polynomials and (b) the corresponding 
Vo (x) VY, eigenenergies (horizontal dashed lines) 
/ and eigenfunctions (solid lines) of the 
Eee » hal sean aie |kaledaeceetaiaes* saecaiinea harmonic oscillator. The black dashed 
/ curve shows the potential profile U(x) 
\ Yo sf drawn on the same scale as the energies 
E E,,, so that its crossings with the energy 
° levels correspond to classical turning 
yrs points. 


95 Perhaps the most important property of this energy spectrum is that it is equidistant: E,,., — E, = i@ = const. 
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The properties of these polynomials, which most important for applications, are as follows: 


(1) the function H,,(é) has exactly n zeros (its plot crosses the €-axis exactly n times); as a 
result, the “parity” (symmetry-antisymmetry) of these functions alternates with n, and 
(11) the polynomials are mutually orthonormal in the following sense: 


fu, (OH, (Eexpt £7 dé = 272" nl, ,. (2.283) 


Using the last property, we may readily calculate, from Eq. (278), the normalized eigenfunctions y,(x) 
of the harmonic oscillator — see Fig.35b: 


1 x 
Y, (x) = arene e A . (2.284) 


At this point, it is instructive to compare these eigenfunctions with those of a 1D rectangular 
potential well, with its ultimately hard walls — see Fig. 1.8. Let us list their common features: 


(1) The wavefunctions oscillate in the classically allowed regions with E,, > U(x), while 
dropping exponentially beyond the boundaries of that region. (For the rectangular well with infinite 
walls, the latter regions are infinitesimally narrow.) 

(ii) Each step up the energy level ladder increases the number of the oscillation half- 
waves (and hence the number of its zeros), by one.% 


And here are the major features specific for soft (e.g., the quadratic-parabolic) confinement: 


(i) The spatial spread of the wavefunction grows with n, following the gradual widening 
of the classically allowed region. 

(ii) Correspondingly, E, exhibits a slower growth than the E, o n* law given by Eq. 
(1.85), because the gradual reduction of spatial confinement moderates the kinetic energy’s growth. 


Unfortunately, the “brute-force” approach to the harmonic oscillator problem, discussed above, 
is not too appealing. First, the proof of Eq. (281) is rather longish — so I do not have time/space for it. 
More importantly, it is hard to use Eq. (284) for the calculation of the expectation values of observables 
including the so-called matrix elements of the system — as we will see in Chapter 4, virtually the only 
numbers important for most applications. Finally, it is also almost evident that there has to be some 
straightforward math leading to any formula as simple as Eq. (262) for E,,. Indeed, there is a much more 
efficient, operator-based approach to this problem; it will be described in Sec. 5.4. 


2.10. Exercise problems 


2.1. The initial wave packet of a free 1D particle is described by Eq. (20): 


P(x,0) = [ae dk 


96 In mathematics, a slightly more general statement, valid for a broader class of ordinary linear differential 
equations, is frequently called the Sturm oscillation theorem, and is a part of the Sturm-Liouville theory of such 
equations — see, e.g., Chapter 10 in the handbook by G. Arfken et al., cited in MA Sec. 16. 
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(i) Obtain a compact expression for the expectation value (p) of the particle's momentum at an 
arbitrary moment ¢ > 0. 
(ii) Calculate (p) for the case when the function |a;|” is symmetric with respect to some value ko. 


2.2. Calculate the function a; defined by Eq. (20), for the wave packet with a rectangular spatial 
envelope: 


Cexp{ikyx}, for—a/2<x<+a/2, 


3 otherwise. 


Y(x,0) = | 
Analyze the result in the limit koa > o. 
2.3. Prove Eq. (49) for the 1D propagator of a free quantum particle, starting from Eq. (48). 


2.4. Express the 1D propagator defined by Eq. (44), via the eigenfunctions and eigenenergies of 
a particle moving in an arbitrary stationary potential U(x). 


2.5. Calculate the change of the wavefunction of a 1D particle, resulting from a short pulse of an 
external classical force that may be approximated by the delta function:9” 


F(t)= P(t). 


2.6. Calculate the transparency Y of the rectangular potential barrier (68), 


0, for x < -—d/2, 
U(x)=4U,, for—d/2<x<+d/2, 
0, ford/2<x, 
for a particle of energy E > Up. Analyze and interpret the result, taking into account that Up may be 


either positive or negative. (In the latter case, we are speaking about the particle’s passage over a 
rectangular potential well of a finite depth |Up|.) 


2.7. Prove Eq. (117) for the case Ywxp << 1, using the connection formulas (105). 


2.8. Spell out the stationary wavefunctions of a harmonic oscillator in the WKB approximation, 
and use them to calculate the expectation values (x*) and (x*) for the eigenstate number 1 >> 1. 


2.9. Use the WKB approximation to express the expectation value of the kinetic energy of a 1D 
particle confined in a soft potential well, in its n™ stationary state, via the derivative dE,/dn, for n >> 1. 


2.10. Use the WKB approximation to calculate the transparency ¥ of the following triangular 
potential barrier: 
0, for x < 0, 


U(x) = 
U,—Fx, forx>0, 


°7 The constant P is called the force’s impulse. (In higher dimensionalities, it is a vector — just as the force is.) 
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with F’, Up > 0, as a function of the incident particle’s energy E. 
Hint: Be careful with the sharp potential step at x = 0. 


2.11." Prove that the symmetry of the 1D scattering matrix S describing an arbitrary time- 
independent scatterer, allows its representation in the form (127). 


2.12. Prove the universal relations between elements of the 1D transfer matrix T of a stationary 
(but otherwise arbitrary) scatterer, mentioned in Sec. 5. 


2.13. A 1D particle had been localized in a very narrow and deep potential well, with the “energy 
area” |U(x)dx equal to —W, where W> 0. Then (say, at tf = 0) the well’s bottom is suddenly lifted up, so 
that the particle becomes completely free. Calculate the probability density, w(k), to find the particle in a 


state with a certain wave number k at ¢ > 0, and the total final energy of the system. 
2.14. Calculate the lifetime of the metastable localized state of a 1D particle in the potential 
U(x) = —w6(x)- Fx, with W>0, 


using the WKB approximation. Formulate the condition of validity of the result. 


2.15. Calculate the energy levels and the corresponding eigenfunctions ne) 
of a 1D particle placed into a flat-bottom potential well of width 2a, with 
infinitely-high hard walls, and a transparent, short potential barrier in the Wo(x) 
middle — see the figure on the right. Discuss particle dynamics in the limit 
when W is very large but still finite. 
—a 0 +a x 


2.16. Consider a symmetric system of two potential wells of 
the type shown in Fig. 21, but with U(0) = U(+o) = 0 — see the 
figure on the right. What is the sign of the well interaction force due 
to their sharing a quantum particle of mass m, for the cases when 
the particle is in: 


(i) a symmetric localized eigenstate: ws(—x) = ys(x)? 
(ii) an antisymmetric localized eigenstate: wa(—x) = —Wa(x)? 


Use an alternative approach to verify your result for the particular case of delta-functional wells. 


2.17. Derive and analyze the characteristic equation for localized 
eigenstates of a 1D particle in a rectangular potential well of a finite depth 
(see the figure on the right): 

—U,, for |x| <a/2, 
0, otherwise. 


u0)=| 


In particular, calculate the number of localized states as a function of well’s width a, and explore the 
limit Up << h/2ma’. 
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2.18. Calculate the energy of a 1D particle localized in a potential well of an arbitrary shape 
U(x), provided that its width a is finite, and the average depth is very small: 


lU|<< 


2 


; where U Ze } U(x)dx 
a 


well 


2ma 


2.19. A particle of mass m is moving in a field with the following potential: 
U(x) = U,(x) + W5(x) ; 
where U (x) is a smooth, symmetric function with Uo(0) = 0, growing monotonically at x — +too. 


(i) Use the WKB approximation to derive the characteristic equation for the particle’s energy 
spectrum, and 
(11) semi-quantitatively describe the spectrum’s evolution at the increase of | # |, for both signs 


of this parameter. 


Make both results more specific for the quadratic-parabolic potential (111): Uo(x) = may x-/2. 
2.20. Prove Eq. (189), starting from Eq. (188). 


2.21. For the problem discussed at the beginning of Sec. 7, i.e. the 1D particle’s motion in an 
infinite Dirac comb potential shown in Fig. 24, 


U(x)=w 5° 6(x- ja) with w>0, 


j=-0 


(where j takes integer values), write explicit expressions for the eigenfunctions at the very bottom and at 
the very top of the lowest energy band. Sketch both functions. 


2.22. A 1D particle of mass m moves in an infinite periodic system of very narrow and deep 
potential wells that may be described by delta functions: 


U(x)=w Sale —ja), with w<0. 
j= 


(1) Sketch the energy band structure of the system for very small and very large values of the 
potential well’s “weight” | #|, and 
(11) calculate explicitly the ground state energy of the system in these two limits. 


2.23. For the system discussed in the previous problem, write explicit expressions for the 
eigenfunctions of the system, corresponding to: 


(1) the bottom of the lowest energy band, 
(11) the top of that band, and 
(111) the bottom of each higher energy band. 


Sketch these functions. 


2.24." The 1D “crystal” analyzed in the last two problems, now extends only to x > 0, with a 
sharp step to a flat potential plateau at x < 0: 
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= WY 5(x ja), with W <0, for x >0, 
~ j=l 


U0, for x <0. 


U (x) 


Prove that the system can have a set of the so-called Tamm states, localized near the “surface” x = 0, and 
calculate their energies in the limit when Up is very large but finite. (Quantify this condition.) °° 


2.25. Calculate the whole transfer matrix of the rectangular potential barrier, specified by Eq. 
(68), for particle energies both below and above Uo. 


2.26. Use the results of the previous problem to calculate the transfer matrix of one period of the 
periodic Kronig-Penney potential shown in Fig. 31b. 


2.27. Using the results of the previous problem, derive the characteristic equations for particle’s 
motion in the periodic Kronig-Penney potential, for both E < Up and E > Uo. Try to bring the equations 
to a form similar to that obtained in Sec. 7 for the delta-functional barriers — see Eq. (198). Use the 
equations to formulate the conditions of applicability of the tight-binding and weak-potential 
approximations, in terms of the system’s parameters, and the particle’s energy E. 


2.28. For the Kronig-Penney potential, use the tight-binding approximation to calculate the 
widths of the allowed energy bands. Compare the results with those of the previous problem (in the 
corresponding limit). 


2.29. For the same Kronig-Penney potential, use the weak-potential limit formulas to calculate 
the energy gap widths. Again, compare the results with those of Problem 27, in the corresponding limit. 


2.30. 1D periodic chains of atoms may exhibit what is called the Peierls instability, leading to 
the Peierls transition to a phase in which atoms are slightly displaced, from the exact periodicity, by 
alternating displacements Ax; = (-1)/Ax, with Ax << a, where is the atom’s number in the chain, and a is 
its initial period. These displacements lead to the alternation of the coupling amplitudes 6, (see Eq. 
(204)) between close values 5,” and 6,. Use the tight-binding approximation to calculate the resulting 
change of the n™ energy band, and discuss the result. 


2.31. Use Eqs. (1.73)-(1.74) of the lecture notes to derive Eq. (252), and discuss the relation 
between these Bloch oscillations and the Josephson oscillations of frequency (1.75). 


2.32. A 1D particle of mass m is placed to the following triangular potential well: 


+o, for x<0, : 
U(x)= with fF >0. 
Fx, for x>0, 


(i) Calculate its energy spectrum using the WKB approximation. 


°8 In applications to electrons in solid-state crystals, the delta-functional potential wells model the attractive 
potentials of atomic nuclei, while Up represents the workfunction, i.e. the energy necessary for the extraction of an 
electron from the crystal to the free space — see, e.g., Sec. 1.1(ii), and also EM Sec. 2.6 and SM Sec. 6.3. 
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(11) Estimate the ground state energy using the variational method, with two different trial 
functions. 

(iii) Calculate the three lowest energy levels, and also the 10" level, with an accuracy better than 
0.1%, from the exact solution of the problem. 

(iv) Compare and discuss the results. 


Hint: The values of the first zeros of the Airy function, necessary for Task (iii), may be found in 
many math handbooks, for example, in Table 9.9.1 of the online version of the collection edited by 
Abramowitz and Stegun — see MA Sec. 16(i). 


2.33. Use the variational method to estimate the ground state energy EF, of a particle in the 
following potential well: 


U(x)=-U, expt ax’} with a>0, and U, >0. 


Spell out the results in the limits of small and large Up, and give their interpretation. 


2.34. For a 1D particle of mass m, placed to a potential well with the following profile, 
U(x)= ax’*, with a>0, and s>0, 


(i) calculate its energy spectrum using the WKB approximation, and 
(ii) estimate the ground state energy using the variational method. 


Compare the ground-state energy results for the parameter s equal to 1, 2, 3, and 100. 


2.35. Use the variational method to estimate the 1“ excited state of the 1D harmonic oscillator. 

2.36. Assuming the quantum effects to be small, calculate the lower part of 
the energy spectrum of the following system: a small bead of mass m, free to move 
without friction along a ring of radius R, which is rotated about its vertical diameter 
with a constant angular velocity @ — see the figure on the right. Formulate a 
quantitative condition of validity of your results. 


> 66 


Hint: This system was used as the analytical mechanics’ “testbed problem” 
in the CM part of this series, and the reader is welcome to use any relations derived 
there. 


2.37. A 1D harmonic oscillator, with mass m and frequency @p, had been in its ground state; then 
an additional force F was suddenly applied, and after that kept constant. Calculate the probability of the 
oscillator staying in its ground state. 


2.38. A 1D particle of mass m has been placed to a quadratic potential well (111), 
max” 
2 


and allowed to relax into the ground state. At t = 0, the well was fast accelerated to move with velocity 
v, without changing its profile, so that at t => 0 the above formula for U is valid with the replacement x > 
x’ =x-—vt. Calculate the probability for the system to still be in the ground state at ¢ > 0. 


U(x)= 


> 
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2.39. Initially, a 1D harmonic oscillator was in its ground state. At a certain moment of time, its 
spring constant « is abruptly increased so that its frequency @ = (x/m)'” is increased by a factor of a, 
and then is kept constant at the new value. Calculate the probability that after the change, the oscillator 
is still in its ground state. 


2.40. A 1D particle is placed into the following potential well: 


+0, for x <0, 
U(x) = 2.2 
ma x /2, forx 20. 
(1) Find its eigenfunctions and eigenenergies. 
(ii) This system had been let to relax into its ground state, and then the potential wall at x < 0 
was rapidly removed so that the system was instantly turned into the usual harmonic oscillator (with the 
same m and @p). Find the probability for the oscillator to remain in its ground state. 


2.41. Prove the following formula for the propagator of the 1D harmonic oscillator: 


. _ MO, i IMO, Ba. os 7 7 
Gentisut)=[ pee | 000 ail +x? eos[a, (t-t))] 2a 


Discuss the relation between this formula and the propagator of a free 1D particle. 


2.42. In the context of the Sturm oscillation theorem mentioned in Sec. 9, prove that the number 
of eigenfunction’s zeros of a particle confined in an arbitrary but finite potential well always increases 
with the corresponding eigenenergy. 


Hint: You may like to use the suitably modified Eq. (186). 


2.43.” Use the WKB approximation to calculate the lifetime of the metastable ground state of a 
1D particle of mass m in the “pocket” of the potential profile 


2 
U(x) = S Sox: 


Contemplate the significance of this problem. 
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Chapter 3. Higher Dimensionality Effects 


The description of the basic quantum-mechanical effects, given in the previous chapter, may be extended 
to higher dimensions in an obvious way. This is why this chapter is focused on the phenomena (such as 
the AB effect and the Landau levels) that cannot take place in one dimension due to topological reasons, 
and also on a few key 3D problems (such as the Born approximation in the scattering theory, and the 
axially- and spherically-symmetric systems) that are important for numerous applications. 


3.1. Quantum interference and the AB effect 


In the past two chapters, we have already discussed some effects of the de Broglie wave 
interference. For example, standing waves inside a potential well, or even on the top of a potential 
barrier, may be considered as a result of interference of incident and reflected waves. However, there are 
some remarkable new effects made possible by spatial separation of such waves, and such separation 
requires a higher (either 2D or 3D) dimensionality. A good example of wave separation is provided by 
the Young-type experiment (Fig. 1) in which particles, emitted by the same source, are passed through 
two narrow holes (or slits) is an otherwise opaque partition. 


2 3—F wx we) 


particle 

detector 
Fig. 3.1. The scheme of the “two-slit” 
(Young-type) interference experiment. 


particle 
source 
partition 

with 2 slits 


According to Eq. (1.22), if particle interactions are negligible (which is always true if the 
emission rate is sufficiently low), the average rate of particle counting by the detector is proportional to 
the probability density w(r, t) = V(r, t) ‘P*(r, 4) to find a single particle at the detector’s location r, 
where ‘P(r, ) is the solution of the single-particle Schrédinger equation (1.25) for the system. Let us 
calculate the rate for the case when the incident particles may be represented by virtually- 
monochromatic waves of energy E (e.g., very long wave packets), so that their wavefunction may be 
taken in the form given by Eqs. (1.57) and (1.62): ‘P(r, 4) = wr) exp{-iEw/h}. In this case, in the free- 
space parts of the system, where U(r) = 0, Wr) satisfies the stationary Schrédinger equation (1.78a): 


h? 
-—-V’y=Ey. (3.1a) 
2m 
With the standard definition k = (2mE)'/h, it may be rewritten as the 3D Helmholtz equation: 


Vytky=0. (3.1b) 
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The opaque parts of the partition may be well described as classically forbidden regions, so if their size 
scale a is much larger than the wavefunction penetration depth 6 described by Eq. (2.59), we may use on 
their surface S' the same boundary conditions as for the well’s walls of infinite height: 


y|, =0. (3.2) 


Eqs. (1) and (2) describe the standard boundary problem of the theory of propagation of scalar waves of 
any nature. For an arbitrary geometry, this problem does not have a simple analytical solution. However, 
for a conceptual discussion of wave interference, we may use certain natural assumptions that will allow 
us to find its particular, approximate solution. 


First, let us discuss the wave emission, into free space, by a small-size, isotropic source located 
at the origin of our reference frame. Naturally, the emitted wave should be spherically symmetric: y(r) 
= y(r). Using the well-known expression for the Laplace operator in spherical coordinates,! we may 
reduce Eq. (1) to the following ordinary differential equation: 
l dj ody . 
——| r° — |+k*y =0. 3:3 
r ral dr e “oy 
Let us introduce a new function, f(7) = wr). Plugging the reciprocal relation y= f/r into Eq. (3), we see 
that it is reduced to the 1D wave equation, 


f=0. (3.4) 


As was discussed in Sec. 2.2, for a fixed k, the general solution of Eq. (4) may be represented in the 
form of two traveling waves: 


f=fe® pie (3.5) 


so that the full wavefunction is 


; F ; : 2 
Sa ike pte gi ie! (r,t) _ J gilkr—at) pf g-ilirton) ee E = hk 66) 
r r r r h 2m 


y(r) = 
If the source is located at point r’ # 0, the obvious generalization of Eq. (6) is 


Y(r,t) = £ (iho + eee, with R= IR 


. Rer-r. (3.7) 


The first term of this solution describes a spherically-symmetric wave propagating from the 
source outward, while the second one, a wave converging onto the source point r’ from large distances. 
Though the latter solution is possible at some very special circumstances (say, when the outgoing wave 
is reflected back from a spherical shell), for our current problem, only the outgoing waves are relevant, 
so that we may keep only the first term (proportional to f,) in Eq. (7). Note that the factor R is the 
denominator (that was absent in the 1D geometry) has a simple physical sense: it provides the 
independence of the full probability current J = 47R7j(R), with j(R)x kPY* oc 1/R’, of the distance R 
between the observation point and the source. 


! See, e.g., MA Eq. (10.9) with 0/00= d/Og= 0. 
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Now let us assume that the partition’s geometry is not too complicated — for example, it is either 
planar as shown in Fig. 1, or nearly-planar, and consider the region of the particle detector location far 
behind the partition (at z >> 1/k), and at a relatively small angle to it: |x |<< z. Then it should be 
physically clear that the spherical waves (7) emitted by each point inside the slit cannot be perturbed too 
much by the opaque parts of the partition, and their only role is the restriction of the set of such emitting 
points to the area of the slits. Hence, an approximate solution of the boundary problem is given by the 
following Huygens principle: the wave behind the partition looks as if it was the sum of contributions 
(7) of point sources located in the slits, with each source’s strength f, proportional to the amplitude of 
the wave arriving at this pseudo-source from the real source — see Fig. 1. This principle finds its 
confirmation in the strict wave theory, which shows that with our assumptions, the solution of the 
boundary problem (1)-(2) may be represented as the following Kirchhoff integral: 


yeecf MO eiRa, with = A 3.8) 


7 


slits 


If the source is also far from the partition, its wave’s front is almost parallel to the slit plane, and 
if the slits are not too broad, we can take y(r’) constant (y2) at each slit, so that Eq. (8) is reduced to 
: , ; cA, , 
y(r) =a", exp{ikl",}+a", exp{ikl",}, with a”,, =F Wize (3.9) 
1,2 
where 4, are the slit areas, and /’’|» are the distances from the slits to the detector. The wavefunctions 
on the slits may be calculated approximately? by applying the same Eq. (7) to the region before the slits: 
Wi & (fi/l’\2)exp{ikl’ 2+, where /’,2 are the distances from the source to the slits — see Fig. 1. As a 
result, Eq. (9) may be rewritten as 


c Ff Ai 


roe . 
l Lal 1,2 


y(r) =a, exp{ikl,}+a, exptikl,}, with /,, =/',, +753. a> 


(3.10) 


(As Fig. 1 shows, each of /;2 is the full length of the classical path of the particle from the source, 
through the corresponding slit, and further to the observation point r.) 


According to Eq. (10), the resulting rate of particle counting at point r is proportional to 


w(r) = y(r)y (r) = la,|’ + la,|” + 2|a,a,|cos Wiss (3.11) 


where 
Py. =k, -1,) (3.12) 


is the difference of the total wave phase accumulations along each of two alternative paths. The last 
expression may be evidently generalized as 


2 For the proof and a detailed discussion of Eq. (8), see, e.g., EM Sec. 8.5. 

3 A possible (and reasonable) concern about the application of Eq. (7) to the field in the slits is that it ignores the 
effect of opaque parts of the partition. However, as we know from Chapter 2, the main role of the classically 
forbidden region is reflecting the incident wave toward its source (i.e. to the left in Fig. 1). As a result, the 
contribution of this reflection to the field inside the slits is insignificant if A,)>> 2°, and even in the opposite case 
provides just some rescaling of the amplitudes a; , which is not important for our conceptual discussion. 
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(3.13) 


with integration along the virtually closed contour C (see the dashed line in Fig. 1), i.e. from point 1, in 
the positive (1.e. counterclockwise) direction all the way to point 2. (From our discussion of the 1D 
WKB approximation in Sec. 2.4, we may expect such generalization to be valid even if k changes, 
sufficiently slowly, along the paths.) 


Our result (11)-(12) shows that the particle counting rate oscillates as a function of the difference 
(/, —1;), which in turn changes with the detector’s position, giving the famous interference pattern, with 
the amplitude proportional to the product | ajar | , and hence vanishing if any of the slits is closed. For 
the wave theory, this is a well-known result,+ but for particle physics, it was (and still is :-) rather 
shocking. Indeed, our analysis is valid for a very low particle emission rate, so that there is no other way 
to interpret the pattern other than resulting from a particle’s interference with itself — or rather the 
interference of its de Broglie waves passing through each of two slits.» Nowadays, such interference is 
reliably observed not only for electrons but also for much heavier particles: atoms and molecules, 
including very complex organic ones;® moreover, atomic interferometers are used as ultra-sensitive 
instruments for measurements of gravity, rotation, and tilt.” 


Let us now discuss a very interesting effect of magnetic field on quantum interference. To 
simplify our discussion, let us consider a slightly different version of the two-slit experiment, in which 
each of the two alternative paths is constricted to a narrow channel using partial confinement — see Fig. 
2. (In this arrangement, moving the particle detector without changing channels’ geometry, and hence 
local values of k may be more problematic experimentally, so let us think about its position r as fixed.) 
In this case, because of the effect of the walls providing the path confinement, we cannot use Eqs. (10) 
for the amplitudes a;,2. However, from the discussions in Sec. 1.6 and Sec. 2.2, it should be clear that 
the first of the expressions (10) remains valid, though maybe with a value of k specific for each channel. 


region with Bz 0 


channel | 4 
SES Sas 8 Z w= WwW) 
SS ES 2 
channel 2 Fig. 3.2. The AB effect. 


In this geometry, we can apply some local magnetic field Z, say normal to the plane of particle 
motion, whose lines would pierce, but not touch the contour C drawn along the particle propagation 


4 See, e.g., a detailed discussion in EM Sec. 8.4. 

5 Here I have to mention the fascinating experiments (first performed in 1987 by C. Hong ef al. with photons, and 
recently, in 2015, by R. Lopes ef al., with non-relativistic particles — helium atoms) on the interference of de 
Broglie waves of independent but identical particles, in the same internal quantum state and virtually the same 
values of E and k. These experiments raise the important issue of particle indistinguishability, which will be 
discussed in Sec. 8.1. 

6 See, e.g., the recent demonstration of the quantum interference of oligo-porphyrin molecules, consisting of 
~2,000 atoms, with a total mass above 25,000 m, — Y. Fein et al., Nature Physics 15, 1242 (2019). 

7 See, e.g., the review paper by A. Cronin, J. Schmiedmayer, and D. Pritchard, Rev. Mod. Phys. 81, 1051 (2009). 
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channels — see the dashed line in Fig. 2. In classical electrodynamics,’ the external magnetic field’s 
effect on a particle with electric charge q is described by the Lorentz force 


F, =qvxB, (3.14) 


where & is the field value at the point of its particle’s location, so that for the experiment shown in Fig. 
2, Fz = 0, and the field would not affect the particle motion at all. In quantum mechanics, this is not so, 
and the field does affect the probability density w, even if B= 0 at all points where the wavefunction 
Yr) is not equal to zero. 


In order to describe this surprising effect, let us first develop a general framework for an account 
of electromagnetic field effects on a quantum particle, which will also give us some by-product results 
important for forthcoming discussions. To do that, we need to calculate the Hamiltonian of a charged 
particle in electric and magnetic fields. For an electrostatic field, this is easy. Indeed, from classical 
electrodynamics we know that such field may be represented as a gradient of its electrostatic potential ¢, 


&€ =-Vd(r), (3.15) 
so that the force exerted by the field on a particle with electric charge gq, 
F, =qé, (3.16) 
may be described by adding the field-induced potential energy, 
U(r)= 4dr), (3.17) 


to other (possible) components of the full potential energy of the particle. As was already discussed in 
Sec. 1.4, such potential energy may be included in the particle’s Hamiltonian operator just by adding it 
to the kinetic energy operator — see Eq. (1.41). 


However, the magnetic field’s effect is peculiar: since its Lorentz force (14) cannot do any work 
on a classical particle: 
dW, =F, -dr =F, -vdt=q(vxB)-vadt =0, (3.18) 


the field cannot be represented by any potential energy, so it may not be immediately clear how to 
account for it in the Hamiltonian. The crucial help comes from the analytical-mechanics approach to 
classical electrodynamics: in the non-relativistic limit, the Hamiltonian function of a particle in an 
electromagnetic field looks like that in the electric field only: 


2 2 
H= +u=7 +464; (3.19) 
2 2m 


however, the momentum p = mv that participates in this expression is now the difference 
p=P-@A. (3.20) 


Here A is the vector potential, defined by the well-known relations for the electric and magnetic fields:!° 


8 See, e.g., EM Sec. 5.1. Note that Eq. (14), as well as all other formulas of this course, are in the SI units. 
9 See, e.g., EM Sec. 9.7, in particular Eq. (9.196). 
10 See, e.g., EM Sec. 6.1, in particular Eqs. (6.7). 
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A 
o-oo, B=VxA, (3.21) 
Ot 
while P is the canonical momentum, whose Cartesian components may be calculated (in classics) from 


the Lagrangian function L using the standard formula of analytical mechanics, 


Pa. (3.22) 


To emphasize the difference between the two momenta, p = mv is frequently called the 
kinematic momentum (or “mv-momentum’’). The distinction between p and P = p + gA becomes more 
clear if we notice that the vector potential is not gauge-invariant: according to the second of Eqs. (21), 
at the so-called gauge transformation 


A>A+Vy, (3.23) 


with an arbitrary single-valued scalar gauge function vy = y(r, t), the magnetic field does not change. 
Moreover, according to the first of Eqs. (21), if we make the simultaneous replacement 


m4 
p>p-, (3.24) 


the gauge transformation does not affect the electric field either. With that, the gauge function’s choice 
does not affect the classical particle’s equation of motion, and hence the velocity v and momentum p. 
Hence, the kinematic momentum is gauge-invariant, while P is not, because according to Eqs. (20) and 
(23), the introduction of v changes it by qV v. 


Now the standard way of transfer to quantum mechanics is to treat the canonical rather than 
kinematic momentum as prescribed by the correspondence postulate discussed in Sec. 1.2. This means 
that in the wave mechanics, the operator of this variable is still given by Eq. (1.26):!! 


Hence the Hamiltonian operator corresponding to the classical function (19) is 


(3.26) 


so that the stationary Schrédinger equation (1.60) of a particle moving in an electromagnetic field (but 
otherwise free) is 


(3.27) 


We may now repeat all the calculations of Sec. 1.4 for the case A # 0, and get the following 
generalized expression for the probability current density: 


!1 The validity of this choice is clear from the fact that if the kinetic momentum was described by this differential 
operator, the Hamiltonian operator corresponding to the classical Hamiltonian function (19), and the 
corresponding Schrédinger equation would not describe the magnetic field effects at all. 
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h j 1 A 
ji-—|v*|V-4a -ccl=—lw*py ~c.c]= Aiyp vo-“La\, (3.28) 
2im h 2m m h 
We see that the current density is gauge-invariant (as required for any observable) only if the 

wavefunction’s phase g changes as 
O04 4. (3.29) 


This may be a point of conceptual concern: since quantum interference is described by the 
spatial dependence of the phase g, can the observed interference pattern depend on the gauge function’s 
choice? (That would not make any sense, because we may change the gauge in our mind.) Fortunately, 
this is not true, because the spatial phase difference between two interfering paths, participating in Eq. 
(12), is gauge-transformed as 


Pr. > Pr +o (a, ~x) (3.30) 


But y has to be a single-valued function of coordinates, hence in the limit when the points 1 and 2 
coincide, 7; = 72, so that Ag gauge-invariant, and so is the interference pattern. 


However, the difference » may be affected by the magnetic field, even if it is localized outside 


the channels in which the particle propagates. Indeed, in this case, the field cannot affect the particle’s 
velocity v and the probability current density j: 


i@)| 440 =I@)) 4-0 - (3.31) 
so that the last form of Eq. (28) yields 


Ver) 429 =V PO) 49+ (3.32) 


Integrating this equation along the contour C (Fig. 2), for the phase difference between points | and 2 
we get 


Pi2| gx =P r| a0 + ee dr, (3.33) 


where the integral should be taken along the same contour C as before (in Fig. 2, from point 1, 
counterclockwise along the dashed line to point 2). But from classical electrodynamics we know!” that 
as points | and 2 tend to each other, i.e. the contour C becomes closed, the last integral is just the 
magnetic flux ® = |,Z,d’r through any smooth surface limited by the contour, so that Eq. (33) may be 
rewritten as 
(3.34a) 


In terms of the interference pattern, this means a shift of interference fringes, proportional to the 
magnetic flux (Fig. 3). 


12 See, e.g., EM Sec. 5.3. 
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(a) (b) 


2m) 


Fig. 3.3. Typical results of a two-paths interference experiment by A. Tonomura et al., Phys. Rev. 
Lett. 56, 792 (1986), showing the AB effect for electrons well shielded from the applied magnetic 
field. In this particular experimental geometry, the AB effect produces a relative shift of the 
interference patterns inside and outside the dark ring. (a) B = Oy 7/2, (b) B = Do’. © 1986 APS. 


This phenomenon is usually called the “Aharonov-Bohm” (or just the AB) effect.!3 For particles 
with a single elementary charge, g = te, this result is frequently represented as 


? ,|a40 — Pi 


O 
oa + 24#—, 3.34b 
B=0 ®,' ( ) 
where the fundamental constant ®o’ = 2zfh/e ~ 4.14x10°'° Wb has the meaning of the magnetic flux 
necessary to change @2 by 27, i.e. to shift the interference pattern (11) by one period, and is called the 
normal magnetic flux quantum — “normal” because of the reasons we will soon discuss. 


The AB effect may be “almost explained” classically, in terms of Faraday’s electromagnetic 
induction. Indeed, a change A® of magnetic flux in time induces a vortex-like electric field A& around 
it. That field is not restricted to the magnetic field’s location, i.e. may reach the particle’s trajectories. 
The field’s magnitude (or rather of its integral along the contour C) may be readily calculated by 
integration of the first of Eqs. (21): 


AV = AG -dr =——_. (3.35) 


I hope that in this expression the reader readily recognizes the integral (“undergraduate”) form of 
Faraday’s induction law.'4 To calculate the effect of this electric field of the particles, let us assume that 
the variable separation described by Eq. (1.57) may be applied to the end points 1 and 2 of particle’s 
alternative trajectories as two independent systems,!> and that the magnetic flux’ change by a certain 
amount A® does not change the spatial factors y2, with the phases gj included into the time- 
dependent factors a). Then we may repeat the arguments that were used in Sec. 1.6 at the discussion of 


13 | prefer the latter, less personable name, because the effect had been actually predicted by Werner Ehrenberg 
and Raymond Siday in 1949, before it was rediscovered (also theoretically) by Y Aharonov and D. Bohm in 
1959. To be fair to Aharonov and Bohm, it was their work that triggered a wave of interest in the phenomenon, 
resulting in its first experimental observation by Robert G. Chambers in 1960 and several other groups soon after 
that. Later, the experiments were improved using ferromagnetic cores and/or superconducting shielding to provide 
a better separation between the electrons and the applied field — as in the work whose result is shown in Fig. 3. 

14 See, e.g., EM Sec. 6.1. 

!5 This assumption may seem a little bit of a stretch, but the resulting relation (37) may be indeed proven for a 
rather realistic model, though that would take more time/space than I can afford. 
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the Josephson effect, and since the change (35) leads to the change of the potential energy difference AU 
= qAV between the two points, we may rewrite Eq. (1.72) as 


= = (3.36) 
dt h h h dt 
Integrating this relation over the time of the magnetic field’s change, we get 
AQ, = “Ae (3.37) 


- superficially, the same result as given by Eq. (34). 


However, this interpretation of the AB effect is limited. Indeed, it requires the particle to be in 
the system (on the way from the source to the detector) during the flux change, i.e. when the induced 
electric field & may affect its dynamics. On the contrary, Eq. (34) predicts that the interference pattern 
would shift even if the field change has been made when there was no particle in the system, and hence 
the field € could not be felt by it. Experiment confirms the latter conclusion. Hence, there is something 
in the space where a particle propagates (i.e., outside of the magnetic field region), that transfers the 
information about even the static magnetic field to the particle. The standard interpretation of this 
surprising fact is as follows: the vector potential A is not just a convenient mathematical tool, but a 
physical reality (just as its scalar counterpart ¢), despite the large freedom of choice we have in 
prescribing specific spatial and temporal dependences of these potentials without affecting any 
observable — see Eqs. (23)-(24). 


To conclude this section, let me briefly discuss the very interesting form taken by the AB effect 
in superconductivity. To be applied to this case, our results require two changes. The first one is simple: 
since superconductivity may be interpreted as the Bose-Einstein condensate of Cooper pairs with 
electric charge g = —2e, Do’ has to be replaced by the so-called superconducting flux quantum'® 


= 2.07x10°'° Wb =2.07x107’ Gs-cm’. (3.38) 


Second, since the pairs are Bose particles and are all condensed in the same (ground) quantum 
state, described by the same wavefunction, the total electric current density, proportional to the 
probability current density 7, may be extremely large — in practical superconducting materials, up to 
~10'? A/m?. In these conditions, one cannot neglect the contribution of that current into the magnetic 
field and hence into its flux ®, which (according to the Lenz rule of the Faraday induction law) tries to 
compensate for changes in external flux. To see possible results of this contribution, let us consider a 
closed superconducting loop (Fig. 4). Due to the Meissner effect (which is just another version of the 
flux self-compensation), the current and magnetic field penetrate into a superconductor by only a small 
distance (called the London penetration depth) 5, ~ 10" m.'7 If the loop is made of a superconducting 
“wire” that is considerably thicker than 6,, we may draw a contour deep inside the wire, at that the 
current density is negligible. According to the last form of Eq. (28), everywhere at the contour, 


Vp-"A=0. (3.39) 


16 One more bad, though common term: a metallic wire may (super)conduct, but a quantum hardly can! 
17 For more detail, see EM Sec. 6.4. 
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Integrating this equation along the contour as before (in Fig. 4, from some point 1, all the way around 
the ring to the virtually coinciding point 2), we need to have the phase difference g2 equal to 27m, 
because the wavefunction y oc exp {ig} in the initial and final points 1 and 2 should be “essentially” the 
same, 1.e. produce the same observables. As a result, we get 


(3.40) 


This is the famous flux quantization effect,'® which justifies the term “magnetic flux quantum” for the 
constant Do given by Eq. (38). 


Be Fig. 3.4. The magnetic flux quantization in a 


superconducting loop (schematically). 


Unfortunately, in this course I have no space/time to discuss the very interesting effects of 
“partial flux quantization” that arise when a superconductor loop is closed with a Josephson junction, 
forming the so-called Superconductor QUantum Interference Device — “SQUID”. Such devices are 
used, in particular, for supersensitive magnetometry and ultrafast, low-power computing. !9 


3.2. Landau levels and quantum Hall effect 


In the last section, we have used the Schrédinger equation (27) for an analysis of static magnetic 
field effects in “almost-1D”, circular geometries shown in Figs. 1, 2, and 4. However, this equation 
describes very interesting effects in fully-higher-dimensions as well, especially in the 2D case. Let us 
consider a quantum particle free to move in the [x, y] plane only (say, due to its strong confinement in 
the perpendicular direction z — see the discussion in Sec. 1.8). Taking the confinement energy for the 
reference, we may reduce Eq. (27) to a similar equation, but with the Laplace operator acting only in the 
directions x and y: 


n?(_ a a 4.) 
n,—ttn, i— A =Ey. 3.41 
rl “ax” oy h Jy . ee 


Let us find its solutions for the simplest case when the applied static magnetic field is uniform 
and perpendicular to the motion plane: 


B=Bn.. (3.42) 


18 Tt was predicted in 1949 by Fritz London and experimentally discovered (independently and virtually 
simultaneously) in 1961 by two experimental groups: B. Deaver and W. Fairbank, and R. Doll and M. Nabauer. 
19 A brief review of these effects, and recommendations for further reading may be found in EM Sec. 6.5. 
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According to the second of Eqs. (21), this relation imposes the following restriction on the choice of the 
vector potential: 


OA 
amar =f, (3.43) 
Ox oy 


but the gauge transformations still give us a lot of freedom in its choice. The “natural” axially- 
symmetric form, A = n,o4/2, where p = (x? + y’)!? is the distance from some z-axis, leads to 
cumbersome math. In 1930, L. Landau realized that the energy spectrum of Eq. (41) may be obtained by 
making a much simpler, though very counter-intuitive choice: 


A,=0, A, =B(x-x)), (3.44) 


(with arbitrary xo), which evidently satisfies Eq. (43), though ignores the physical symmetry of the x and 
y directions for the field (42). 


Now, expanding the eigenfunction into the Fourier integral in the y-direction: 


y(x,y) = |X, (a)exptik(y— yy )}dk, (3.45) 
we see that for each component of this integral, Eq. (41) yields a specific equation 
h? d qd. : 
—-— jn, —+in,| k-=B(x-x Mea EX 3.46 
am | x dx i h ( o) | k k ( ) 


Since the two vectors inside the curly brackets are mutually perpendicular, its square has no cross-terms, 
so that Eq. (46) reduces to 
2 2 2 
h 
- aa a +4 8?(x—x,' X, =EX,, where x,’ =x, + ay (3.47) 
2m qh 
But this 1D Schrédinger equation is identical to Eq. (2.261) for a 1D harmonic oscillator,2° with the 
center at point xo’, and frequency @ equal to 
|aB 


o, ==. (3.48) 
m 


In the last expression, it is easy to recognize the cyclotron frequency of the classical particle’s rotation in 
the magnetic field. (It may be readily obtained using the 2" Newton law for a circular orbit of radius r, 


2 


m-—=F,, =qvB, (3.49) 
r 


and noting that the resulting ratio v/r =| ¢.B| /m is just the radius-independent angular velocity @, of the 
particle’s rotation.) Hence, the energy spectrum for each Fourier component of the expansion (45) is the 
same: 


20 This result may become a bit less puzzling if we recall that at the classical circular cyclotron motion of a 
particle, each of its Cartesian coordinates, including x, performs sinusoidal oscillations with frequency (48), just 
as a 1D harmonic oscillator with this frequency. 
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(3.50) 


independent of either xo, or yo, or k. 


This is a good example of a highly degenerate system: for each eigenvalue E,, there are many 
similar eigenfunctions that differ only by the positions {xo, yo} of their centers, and the rate k of their 
phase change along the y-axis. They may be used to assemble a large variety of linear combinations, 
including 2D wave packets whose centers move along classical circular orbits. Note, however, that the 
radius of such rotation cannot be smaller than the so-called Landau radius, 


1/2 1/2 
_| A 
c 7 |aB 


which characterizes the minimum size of the wave packet, and follows from Eq. (2.276) after the 
replacement @ — @,. This radius is remarkably independent of the particle mass, and may be 
interpreted in the following way: the scale Amin of the applied magnetic field’s flux through the 


(3.51) 


effective area Amin = 221. of the smallest wave packet is just one normal flux quantum ®o’ = 27fi/| q |. 


A detailed analysis of such wave packets (for which we would not have time in this course), in 
particular proves the virtually evident fact: the applied magnetic field does not change the average 
density dN2/dE of different 2D states on the energy scale, following from Eq. (1.99), but just 
“assembles” them on the Landau levels (see Fig. 5a), so that the number of different orbital states on 
each Landau level (per unit area) is 


N, 1dN, 1 dN, ake 1 lA 1 |aB 
— = AE = 2ak ho, =——. (3.52 
"A A dE | 200 A 770 dk dE/dk A(Q2QrP kim * Oath a 


This expression may again be interpreted in terms of magnetic flux quanta: n_Do’ = %, i.e. there is one 
particular state on each Landau level per each normal flux quantum. 


E R= 
ye) (b) 
7 electrodes 
[re. ———— “r Fig. 3.5. (a) The “assembly” of 
——=»> —_—O—O—O— + 2D states on Landau levels, and 
[ie (b) filling the levels with 


=s+»y Yo-0-0— Pome electrons at the quantum Hall 
0 effect. 


The most famous application of the Landau levels picture is the explanation of the guantum Hall 
effect?!. It is usually observed in the “Hall bar” geometry sketched in Fig. 6, where electric current / is 
passed through a rectangular conducting sample placed into magnetic field @ perpendicular to the 


2! It was first observed in 1980 by a group led by Klaus von Klitzing, while the classical version (54) of the effect 
was first observed by Edwin Hall a century earlier — in 1879. 
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sample’s plane. The classical analysis of the effect is based on the notion of the Lorentz force (14). As 
the magnetic field is turned on, this force starts to deviate the effective charge carriers (electrons or 
holes) from their straight motion between the electrodes, bending them toward the insulated sides of the 
bar (in Fig. 6, parallel to the x-axis). Here the carriers accumulate, generating a gradually increasing 
electric field & until its force (16) exactly balances the Lorentz force (14): 


gq, =q,B, (3.53) 


where v, is the drift velocity of the carriers along the bar (Fig. 6), providing the sustained balance 
condition &/v,= ZB at each point of the sample. 


Fig. 3.6. The Hall bar geometry. Darker 
0 rectangles show external (3D) electrodes. 
x 


With nz carriers per unit area, in a sample of width w, this condition yields the following 
classical expression for the so-called Hall resistance Ry, remarkably independent of w and /: 


(3.54) 


This formula is broadly used in practice for the measurement of the 2D density n2 of the charge carriers, 
and of the carrier type — electrons with g = —e < 0, or holes with the effective charge g = +e > 0. 


However, in experiments with high-quality (low-defect) 2D well structures, at sub-kelvin 
temperatures”? and high magnetic fields, the linear growth of Ry with 4, described by Eq. (54), is 
interrupted by virtually horizontal plateaus (Fig. 7). Most remarkably, the experimental values of Ry on 
these plateaus are reproduced with extremely high accuracy (up to ~10”) from experiment to experiment 
and, even more remarkably, from sample to sample.23 They are described by the following formula: 


(3.55) 


so that 
R, ® 25.812 807 459 304... kQ,, (3.56) 


and i is (only until the end of this section, following tradition!) the plateau number, 1.e. a real integer. 


22 In some systems, such as the graphene (virtually perfect 2D sheets of carbon atoms — see Sec. 4 below), the 
effect may be more stable to thermal fluctuations, due to their topological properties, so that it may be observed 
even at room temperature — see, e.g., K. Novoselov et al., Science 315, 1379 (2007). Also note that in some thin 
ferromagnetic layers, the quantum Hall effects may be observed in the absence of an external magnetic field — see, 
e.g., M. Gotz et al., Appl. Phys. Lett. 112, 072102 (2018) and references therein. 

23Due to this high accuracy (which is a rare exception in solid-state physics!), since 2018 the von Klitzing 
constant Rx is used in metrology for the “legal” ohm’s definition, with its value (56) considered fixed — see 
Appendix CA: Selected Physical Constants. 
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Fig. 3.7. A typical record of the integer 
quantum Hall effect. The lower trace (with 
sharp peaks) shows the diagonal element, 
V,/I,, of the resistance tensor. (Adapted from 
https://www.nobelprize.org/nobel_prizes/phy 
sics/laureates/1998/press.html ). 


1 I ' 1 


l 
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This effect may be explained using the Landau level picture. The 2D sample is typically in a 
weak contact with 3D electrodes whose conductivity electrons, at low temperatures, fill all states with 
energies below a certain Fermi energy Er — see Fig. 5b. According to Eqs. (48) and (50), as & is 


increased, the spacing fia, between the Landau levels increases proportionately, so that fewer and fewer 
of these levels are below Er (and hence all their states are filled in equilibrium), and within certain 
ranges of field variations, the number i of the filled levels is constant. (In Fig. 5b, i = 2.) So, plugging n2 
= in, and q =-e into Eq. (54), and using Eq. (52) for nL, we get 
1B _12nh 


: . 2? 
ign, ie 


Re (3.57) 


i.e. exactly the experimental result (55). 


This admittedly oversimplified explanation of the quantum Hall effect does not take into account 
at least two important factors: 


(i) the nonuniformity of the background potential U(x, y) in realistic Hall bar samples, and the 
role of the quasi-1D edge channels this nonuniformity produces;?+ and 


(ii) the Coulomb interaction of the electrons, in high-quality samples leading to the formation of 
Ry plateaus with not only integer but also fractional values of 7 (1/3, 2/5, 3/7, etc.).?5 


Unfortunately, a thorough discussion of these very interesting features is well beyond the 
framework of this course.227 


24 Such quasi-1D regions, with the width of the order of 7, form along the lines where the Landau levels cross the 
Fermi surface, and are actually responsible for all the electron transfer at the quantum Hall effect (giving the 
pioneering example of what is nowadays called the topological insulators). The particle motion along these 
channels is effectively one-dimensional; because of this, it cannot be affected by modest unintentional 
nonuniformities of the potential U(x, y). This fact is responsible for the extraordinary accuracy of Eq. (55). 

25 This fractional quantum Hall effect was discovered in 1982 by D. Tsui, H. Stormer, and A. Gossard. In 
contrast, the effect described by Eq. (55) with an integer i (Fig. 7) is now called the integer quantum Hall effect. 

26 For a comprehensive discussion of these effects, I can recommend, e.g., either the monograph by D. Yoshioka, 
The Quantum Hall Effect, Springer, 1998, or the review by D. Yennie, Rev. Mod. Phys. 59, 781 (1987). (See also 
the later publications cited above.) 

27 Note also that the quantum Hall effect is sometimes discussed in terms of the so-called Berry phase, one of the 
geometric phases — the notion apparently pioneered by S. Pancharatnam in 1956. However, in the “usual” 
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3.3. Scattering and diffraction 


The second class of quantum effects, which becomes richer in multi-dimensional spaces, is 
typically referred to as either diffraction or scattering — depending on the context. In classical physics, 
these two terms are used to describe very different effects. The term “diffraction” is used for the 
interference of the waves re-emitted by elementary components of extended objects, under the effect of 
a single incident wave.?8 On the other hand, the term “scattering” is used in classical mechanics to 
describe the result of the interaction of a beam of incident particles?? with such an extended object, 
called the scatterer — see Fig. 8. 


r>>a,k"' 
k detector 
k a 
ee eee F _--7” Scattered particles 
————_-»- -------f--=}------- -———_> 
S&S 
incident scatterer : . . 
particles ae Fig. 3.8. Scattering (schematically). 


Most commonly, the detector of the scattered particles is located at a large distance r >> a from 
the scatterer. In this case, the main observable independent of r is the flux (the number per unit time) of 
particles scattered in a certain direction, i.e. their flux per unit solid angle ©. Since it is proportional to 
the incident flux of particles per unit area, the efficiency of scattering in a particular direction may be 
characterized by the ratio of these two fluxes. This ratio has is called the differential cross-section of the 
scatterer: 


do _ flux of scatterd particles per unit solid angle 


= —: ; (3.58) 
dQ. flux of incident particles per unit area 


Such terminology and notation stem from the fact that the integral of do/dQ. over all scattering angles, 


_¢do,. _ total flux of scattered particles 
c= f Bias = 


(3.59) 


. . . 2 
incident flux per per unit area 


evidently having the dimensionality of area, has a simple interpretation as the total cross-section of 
scattering. For the simplest case when a solid object scatters all classical particles hitting its surface, but 
does not affect the particles flying by it, o is just the geometric area of the scatterer, as observed from 
the direction of the incident particles. In classical mechanics, we first calculate the particle’s scattering 


quantum Hall effect the Berry phase equals zero, and I believe that this concept should be saved for the discussion 
of more topologically involved systems. Unfortunately, I will have no time/space for a discussion of such systems 
in this course, and have to refer the interested reader to special literature — see, e.g., either the key papers collected 
by A. Shapere and F. Wilczek, Geometric Phases in Physics, World Scientific, 1992, or the monograph by A. 
Bohm et al., The Geometric Phase in Quantum Systems, Springer, 2003. 

28 The notion of interference is very close to diffraction, but the former term is typically reserved for the wave re- 
emission by just a few components, such as two slits in the Young experiment — see Figs. | and 2. A detailed 
discussion of diffraction and interference of electromagnetic waves may be found in EM Secs. 8.3-8.8. 

29 In the context of classical waves, the term “scattering” is typically reserved for wave interaction with 
disordered sets of small objects — see, e.g., EM Sec. 8.3. 
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angle as a function of its impact parameter b and then average the result over all values of b, considered 
random. *° 


In quantum mechanics, due to the particle/wave duality, a relatively broad, parallel beam of 
incident particles of the same energy E may be fairly represented with a plane de Broglie wave (1.88): 


exp{ik, -r}, (3.60) 


Vi =|V; 


with the free-space wave number kj = k = (2mE)'’/h. As a result, the particle scattering becomes a 
synonym of the de Broglie wave diffraction, and (somewhat counter-intuitively) the description of the 
effect becomes simpler, excluding the notion of the impact parameter. Indeed, the wave (60) 
corresponds to a constant probability current density (1.49): 


; 2h 

i=wil —k,. (3.61) 
m 

which is exactly the flux of incident particles per unit area that is used in the denominator of Eq. (58), 

while the numerator of that fraction may be simply expressed via the probability current density j, of the 

scattered de Broglie waves: 


- 2 
OE ITE). pha. (3.62) 
dj, 


Hence our task is reduced to the calculation of j, at sufficiently large distances r from the 
scatterer. For that, let us rewrite the stationary Schrodinger equation for the elastic scattering problem 
(when the energy E of the scattered particles is the same as that of the incident particles) in the form 

- vial, ah hi hk? 
(e-4,)y=UMy, — with A,=-2-v’, and E= (3.63) 
2m 2m 


where the potential energy U(r) describes the effect of the scatterer. Looking for the solution of Eq. (62) 
in the natural form 


YHrWitY,, (3.64) 


where y; is the incident wave (60) and yx has the sense of the scattered wave, and taking into account 
that the former wave satisfies the free-space Schrédinger equation 


Hwy, =Ey;, (3.65) 


we may reduce Eq. (63) to either of the following equivalent forms: 
5 2m 
(e-fh)y,=UCly,+y,) (0+! )y, = tue. (3.66) 


For applications, an integral form of this equation is more convenient. To derive it, we may look 
at the second of Eqs. (66) as a linear, inhomogeneous differential equation for the function yx, thinking 
of its right-hand side as a known “source”. The solution of such an equation obeys the linear 
superposition principle, 1.e. we may represent it as the sum of the waves outcoming from all elementary 
volumes d’r’ of the scatterer. Mathematically, this sum may be expressed as either 


30 See, e.g., CM Sec. 3.5. 
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2m 
YO) = [U@ WOIGO.r)a?r", (3.67a) 
or, equivalently, as?! 


y(r)=y, (0) 2 fUC wera d'Y , (3.67b) 


where G(r, r’) is the spatial Green’s function, defined as such an elementary, spherically-symmetric 
response of the 3D Helmholtz equation to a point source, i.e. the outward-propagating solution of the 
following equation? 


(V?4+k°)G=6(r-r’). (3.68) 
But we already know such solution of this equation — see Eq. (7) and its discussion: 
Gerry = el, where R=r-r'’, (3.69) 


so that we need just to calculate the coefficient f, for Eq. (68). This can be done in several ways, for 
example by noticing that at R << k', the second term on the left-hand side of Eq. (68) is negligible, so 
that it is reduced to the well-known Poisson equation with a delta-functional right-hand side, which 
describes, for example, the electrostatic potential induced by a point electric charge. Either recalling the 
Coulomb law or applying the Gauss theorem, we readily get the asymptote 


G-> a at kR << 1, (3.70) 
47R 
which is compatible with Eq. (69) only iff, =—1/4z7, ie. if 
1 ikR 
Grr')=-——e™". 3.71 
(r,r’) in (3.71) 


Plugging this result into Eq. (67a), we get the following formal solution of Eq. (66): 


m 
2h? 


YW, (r) = 


| U(r AO Rar, (3.72) 


Note that if the function U(r) is smooth, the singularity in the denominator is integrable (i.e. not 
dangerous); indeed, the contribution of a sphere with some radius € > 0, with the center at point r’, into 
this integral scales as 


3! This formula is sometimes called the Lipmann-Schwinger equation, though more frequently this term is 
reserved for either its operator form or the resulting equation for the spatial Fourier components of y and yw. 

32 Please notice both the similarity and difference between this Green’s function and the propagator discussed in 
Sec. 2.1. In both cases, we use the linear superposition principle to solve wave equations, but while Eq. (67) gives 
the solution of the inhomogeneous equation (66), Eq. (2.44) does that for a homogeneous Schrédinger equation. 
In the latter case, the elementary wave sources are the elementary parts of the initial wavefunction, rather than of 
the equation’s right-hand side as in our current problem. 

33 See, e.g., EM Sec. 1.2. 
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dR. *R?dR 
| zd R 


R 
=41| RdR = 27? > 0. (3.73) 
R<R R 0 


So far, our result (72) is exact, but its apparent simplicity is deceiving, because the wavefunction 
y on its right-hand side generally includes not only the incident wave yy but also the scattered wave y% 
— see Eq. (64). The most straightforward, and most common simplification of this problem, called the 
Born approximation,* is possible if the scattering potential U(r) is in some sense small. (We will derive 
the quantitative condition of this smallness in a minute.) Since at U(r) = 0 the scattering wave y, has to 
disappear, at small but non-zero U(r), |y%| has to be much smaller than |y|. In this case, on the right- 
hand side of Eq. (73) we may ignore yin comparison with yi, getting 


m 


ae exp{ik, rT j gikR 3 1 ; 


: (3.74) 


yr) =— 


y fue’) 


Actually, Eq. (74) gives us even more than we wanted: it evaluates the scattered wave at any 
point, including those within of the scattering object, while to spell out Eq. (62), we only need to find 
the wave far from the scatterer, at r + 0. However, before going to that limit, we can use this general 
formula to find a quantitative criterion of the Born approximation’s validity. For that, let us estimate the 
magnitude of the right-hand side of this equation for a scatterer of a linear size ~a, and the potential 
magnitude’s scale Up. The results are different in the following two limits: 


(i) If ka << 1, then inside the scatterer (1.e., at distances Ar’ ~ a), both exp {ik-r’} and the second 
exponent under the integral in Eq. (74) change little, so that a crude but fair estimate of the solution’s 
magnitude is 


Ua: (3.75) 


lv. - it 2 Vv; 
2mh 

(ii) In the opposite limit ka >>1, the function under the integral is nearly periodic in one of the 

spatial directions (that of the scattered wave propagation), so that the net integral accumulates only on 
distances of the order of the de Broglie wavelength, ~', and the integral is correspondingly smaller: 


2 

a 
U,—. 3.76 
"i (3.76) 


lv. as = 2 Vv; 
2m 
These relations allow us to spell out the Born approximation’s condition, | TA <<| A , as 


2 


U, << -max[ka, ne (3.77) 
ma 


In the fraction on the right-hand side, we may readily recognize the scale of the kinetic (quantum- 
confinement) energy E, of the particle inside a potential well of a size of the order of a, so that the Born 
approximation is valid essentially if the potential energy of particle’s interaction with the scatterer is 


34 Named after M. Born, who was the first to apply this approximation in quantum mechanics. However, the 
basic idea of this approach had been developed much earlier (in 1881) by Lord Rayleigh in the context of 
electromagnetic wave scattering — see, e.g., EM Sec. 8.3. Note also that the contents of that section repeat some 
aspects of our current discussion — perhaps regrettably but unavoidably so, because the Born approximation is a 
centerpiece of the theory of scattering/diffraction for both the electromagnetic waves and the de Broglie waves. 
Hence I felt I had to cover it in this course for the benefit of the readers who skipped the EM part of my series. 
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smaller than E,. Note, however, that the estimates (75) and (76) are not valid in some special situations 
when the effects of scattering accumulate in some direction. This is frequently the case for small angles 
@ of scattering by extended objects, when ka >> 1, but kaO@ XZ 1. 


Now let us proceed to large distances r >> r’ ~ a, and simplify Eq. (74) using an approximation 
similar to the dipole expansion in electrodynamics.*> Namely, in the denominator’s R, we may ignore r’ 
in comparison with the much larger 7, but the exponents require more care, because even if r’~ a <<, 
the product kr’ ~ ka may still be of the order of 1. In the first approximation in r’, we can take (Fig. 9a): 


R=|r-r’ 


zr-n-r', (3.78) 


and since directions of the vectors k and r coincide, 1.e. k = An,, we get 


kR~kr—k-r', sothat eR ~ ele ikt" (3.79) 


(a) (b) 


R : 
detector 


le 


r 


Fig. 3.9. (a) The long-range expansion of R, and (b) the definitions of q, v , and @. 


With this replacement, Eq. (74) yields 


m A ik ’ . ' ' 
oe —el "[U@')expt-i(k -k;)-r er’. (3.80) 


This relation is a particular case of a more general formula*® 


y(n) =- 


Scattering 
function: 
definition 


(3.81) 


where f(k, ki) is called the scattering function.’ The physical sense of this function becomes clear from 
the calculation of the corresponding probability current density j;. For that, generally, we need to use 
Eq. (1.47) with the gradient operator having all spherical-coordinate components.38 However, at kr >> 1, 
the main contribution to Vy , proportional to k >> 1/r, is provided by differentiating the factor e””, 
which changes in the common direction of vectors r and k, so that 


35 See, e.g., EM Sec. 8.2. 

36 It is easy to prove that this form is an asymptotic form of any solution y, of the scattering problem (even that 
beyond the Born approximation) at sufficiently large distances r >> a, k'. 

37 Note that the function f has the dimension of length, and does not account for the incident wave. This is why 
sometimes a dimensionless function, S = 1 + 2ikf, is used instead. This function S is called the scattering matrix, 
because it may be considered a natural generalization of the 1D matrix S defined by Eq. (2.124), to higher 
dimensionality. 

38 See, e.g., MA Eq. (10.8). 
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Vy, <n, Sy, xky,, at kr >>1, (3.82) 
r 


and Eq. (1.47) yields 


j,(9) ~ “ly, (3.83) 


Plugging this expression, and also Eq. (61) into Eq. (62), for the differential cross-section we get simply 


(3.84) 


while the total cross-section is 
(3.85) 


so that the scattering function /(k, kj) gives us everything we need — and in fact more, because the 
function also contains information about the phase of the scattered wave. 


According to Eq. (80), in the Born approximation the scattering function is reduced to the so- 
called Born integral 


a. ~IQe8 73 B 
F(K,k;) = ik 2 (3.86) ae 
where for the notation simplicity r’ is replaced with r, and the following scattering vector is introduced: 
q=k-k,, (3.87) 
with the length g = 2k sin(@/2), where @ is the scattering angle between the vectors k and kj — see Fig. 
9b. For the differential cross-section, Eqs. (84) and (86) yield? 
Differential 
do - U —iq: ray 3 88 ce 
aor [Jue G88) Eom 


approximation 


This is the main result of this section; it may be further simplified for spherically-symmetric 
scatterers, with 
U(r) =U(r). (3.89) 


In this case, it is convenient to represent the exponent in the Born integral as exp{-igr’cosy}, where y is 
the angle between the vectors k (i.e. the direction n, toward the detector) and q (rather than the incident 
wave vector kj!) — see Fig. 9b. Now, for a fixed q, we can take this vector’s direction for the polar axis 
of a spherical coordinate system, and reduce Eq. (86) to a 1D integral: 


f(k,k;) = 


-[rarven [ao] sin ydy exp {—igr' cos 7} 


39 Note that according to Eq. (88), in the Born approximation the scattering intensity does not depend on the sign 
of the potential U, and also that scattering in a certain direction is completely determined by a specific Fourier 
component of the function U(r), namely by its harmonic with the wave vector equal to the scattering vector q. 
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0 


m 2 
=— r°drU(r) 27 
oat | 


2singr _ 


= fu (r)sin(gr) rdr. (3.90) 
heqy 


As a simple example, let us use the Born approximation to analyze scattering on the following 
spherically-symmetric potential: 


U(r) =U, ep = t (3.91) 
a 


In this particular case, it is better to avoid the temptation to exploit the spherical symmetry by using Eq. 
(90), and instead, use the general Eq. (88), because it may be represented as a product of three similar 
Cartesian factors: 


U T : 
S(k,k,) =- a Ld with J, = fesn|-[ 2 wins ps ; (3.92) 


—00 


and similar integrals for J, and /. From Chapter 2, we already know that the Gaussian integrals like J, 
may be readily worked out by complementing the exponent to the full square, in our current case giving 


(3.93) 


Now, the total cross-section o is an integral of do/dQ over all directions of the vector k. Since 
in our case the scattering intensity does not depend on the azimuthal angle g, the only nontrivial 
integration is over the scattering angle 0 — see Fig. 9b: 


do tdo mU,a? \ % oY 
o = $— dO = 2x] ——sin 0d0 = 4n7a"| ——*— | [expy—| 2ksin=] a? psin do 
dQ 1 dQ h 2 


2 
0 


2? dan 2 2)? 5-9 
sail eran Jexp{-24*a*(1—cos6)}d(1- cos) == mee ) [1 goa | 


0=0 


Let us analyze these results. In the low-energy limit, ka << 1 (and hence ga << 1 for any 
scattering angle), the scattered wave is virtually isotropic: do/dQ ~ const — a very typical feature of a 
scalar-wave scattering? by small objects, in any approximation. Note that according to Eq. (77), the 
Born expression for o, following from Eq. (94) in this limit, 


2 2 
o= ome (3.95) 


h 


40 Note that this is only true for scalar (e.g., the de Broglie) waves, and different for vector ones, in particular the 
electromagnetic waves, where the intensity of the dipole radiation, and hence the scattering by small objects 
vanishes in the direction of the incident field’s polarization — see, e.g., EM Eqs. (8.26) and (8.139). 
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is only valid if ois much smaller than the scale a’ of the physical cross-section of the scatterer. In the 
opposite, high-energy limit ka >>1, the scattering is dominated by small angles 0 ~ q/k ~ \/ka ~ Na: 


2 2 
Sw mr ( MO exp|-k?a707}. (3.96) 


dQ . 


This is, again, very typical for diffraction. Note, however, that due to the smooth character of the 
Gaussian potential (91), the diffraction pattern (98) exhibits no oscillations of do/dQ as a function of the 
diffraction angle 0. 


Such oscillations naturally appear for scatterers with sharp borders. Indeed, let us consider a 
uniform spherical scatterer, described by the potential 


U,, forr<R, 
U(r)= (3.97) 


0, otherwise. 


In this case, integration by parts of Eq. (90) readily yields 


do _ 


2mU 
ae (qRcosqR—singR), so that a 


hq 


2mU, 

hq 
According to this result, the scattered wave’s intensity drops very fast with g, so that one needs a semi- 
log plot (such as shown in Fig. 10) to reveal small diffraction fringes,*! with the n' destructive 
interference (zero-intensity) point tending to gR = a(n + '4) at n > ~. Since, as Fig. 9b shows, g may 
only change from 0 to 2k, these intensity minima are only observable at sufficiently large values of the 
parameter kR, when they correspond to real values of the scattering angle @ (At KR >> 1, approximately 
kR/x of these minima, i.e. “dark rings” of low scattering probability, are observable.) On the contrary, at 
kR << | all allowed values of gR are much smaller than 1, and is this limit, the differential cross-section 
does not depend on gR, i.e. the scattering by the sphere (as by any object in this limit) is isotropic. 


f(k,k,) = ] (arcosaie-singr (3.98) 


0.1 
1 do 
— 0.01 
2 
ujo, dQ 
Ix103 Fig. 3.10. The differential cross-section of 
the Born scattering of a particle by a 
<4 “hard”  (sharp-border) sphere (97), 
1x10 : : F : 
normalized to its geometric cross-section 
: Oz = aR° and the square of the potential’s 
1x10 magnitude parameter wu) = Up/(h’/2mR?), as 
a function of the normalized magnitude of 
Ix10° the scattering vector q. 
0 1 2 3 4 5 
qR/ax 


This example shows that in quantum mechanics the notions of particle scattering and diffraction 
are essentially inseparable. 


41 Their physics is very similar to that of the Fraunhofer diffraction on a 1D scatterer — see, e.g., EM Sec. 8.4. 
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The Born approximation, while being very simple and used more than any other scattering 
theory, is not without substantial shortcomings, as becomes clear from the following example. It is not 
too difficult to prove the so-called optical theorem, strictly valid for an arbitrary scatterer: 


Im f(k,k,)=0. (3.99) 


However, Eq. (86) shows that in the Born approximation, the function fis purely real at q = 0 (i.e. for k 
= kj), and hence cannot satisfy the optical theorem. Even more evidently, it cannot describe such a 
simple effect as a dark shadow (y = 0) cast by a virtually opaque object (say, with U >> E). There are 
several ways to improve the Born approximation, while still keeping its general idea of an approximate 
treatment of U. 


(1) Instead of the main assumption y% x Up, we may use a complete perturbation series: 
Yo=VW,t+y,+... (3.100) 


with y,, 0c Up", and find successive approximations y; one by one. In the 1“ approximation we return to 
the Born formula, but already the 2" approximation yields 


k 
Im f,(k;,k; )=—o,, (3.101) 
An 
where oj; is the total cross-section calculated in the 1“ approximation, so that the optical theorem (99) is 
“almost satisfied”. 


(ii) As was mentioned above, the Born approximation does not work very well for the objects 
stretching along the direction (say, x) of the initial wave vector k;. This deficiency may be corrected by 
the so-called eikonal*” approximation, which replaces the plane-wave representation (60) of the incident 
wave with a WKB-like exponent, though still in the 1 approximation in U > 0: 


exp{ik,x} > eofifae va = eff 2mlE _ Dh | 


| m hi r t 
= on bs — ya, (UC )dx } 


Results of this approach satisfy the optical theorem (99) already in the 1“ approximation. 


(3.102) 


Another way toward quantitative results in the theory of scattering, beyond the Born 
approximation, may be pursued for spherically-symmetric potentials (89); I will discuss it in Sec. 8, 
after a general discussion of particle motion in such potentials in Sec. 7. 


3.4. Energy bands in higher dimensions 


In Sec. 2.7, we have discussed the 1D band theory for potential profiles U(x) that obey the 
periodicity condition (2.192). For what follows, let us notice that the condition may be rewritten as 


U(x+ X)=U(x), (3.103) 


42 From the Greek word eukov, meaning “image”. In our current context, this term is purely historic. 
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where X = ta, with rt being an arbitrary integer. One may say that the set of points X forms a periodic 
ID lattice in the direct (r-) space. We have also seen that each Bloch state (i.e., each eigenstate of the 
Schrédinger equation for such periodic potential) is characterized by the quastmomentum fig, and its 
energy does not change if g is changed by a multiple of 27/a. Hence if we form, in the reciprocal (q-) 
space, a 1D lattice of points O = /b, with b = 27/a and integer /, any pair of points from these two 
mutually reciprocal lattices satisfies the following rule: 


exp{iOx } = exp it mu} =e?md 1. (3.104) 
a 


In this form, the results of Sec. 2.7 may be readily extended to d-dimensional periodic potentials 
whose translational symmetry obeys the following natural generalization of Eq. (103): 


U(r+R)=U(r), (3.105) 


where the points R, which may be numbered by d integers 7, form the so-called Bravais lattice:* 


d 
j=l 


with d primitive vectors a;. The simplest example of a 3D Bravais lattice is given by the simple cubic 
lattice (Fig. 11a), which may be described by a system of mutually perpendicular primitive vectors a; of 
equal length. However, not in any lattice these vectors are perpendicular; for example, Figs. 11b and 11c 
show possible sets of the primitive vectors describing, respectively, the face-centered cubic (fcc) lattice 
and the body-centered cubic (bcc) lattice. In 3D, the science of crystallography, based on group theory, 
distinguishes, by their symmetry properties, 14 Bravais lattices grouped into 7 different /attice 
systems 44 


(b) 


Fig. 3.11. The simplest (and most common) 3D Bravais lattices: (a) simple cubic, (b) face-centered cubic 
(fcc), and (c) body-centered cubic (bcc), and possible choices of their primitive vector sets (blue arrows). 


Note, however, not all highly symmetric sets of points form Bravais lattices. As probably the 
most striking example, the nodes of a very simple 2D honeycomb lattice (Fig. 12a)*5 cannot be 


43 Named after Auguste Bravais, the crystallographer who introduced this notion in 1850. 

44 A very clear, well-illustrated introduction to the Bravais lattices is given in Chapters 4 and 7 of the famous 
textbook by N. Ashcroft and N. Mermin, Solid State Physics, Saunders College, 1976. 

45 This structure describes, for example, the now-famous graphene: isolated monolayer sheets of carbon atoms 
arranged in a honeycomb lattice with an interatomic distance of 0.142 nm. 
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described by a Bravais lattice — while those of the 2D hexagonal lattice shown in Fig. 12b, can. The 
most prominent 3D case of such a lattice is the diamond structure (Fig. 12c), which describes, in 
particular, silicon crystals.4¢ In cases like these, the band theory is much facilitated by the fact that the 
Bravais lattices using some point groups (called primitive unit cells) may describe these systems.*” For 
example, Fig. 12a shows a possible choice of the primitive vectors for the honeycomb lattice, with the 
primitive unit cell formed by any two adjacent points of the original lattice (say, within the dashed ovals 
on that panel). Similarly, the diamond lattice may be described as an fcc Bravais lattice with a two-point 
primitive unit cell — see Fig. 12c. 


Fig. 3.12. Two important periodic structures that require two-point primitive cells for their Bravais lattice 
representation: (a) 2D honeycomb lattice and (c) 3D diamond lattice, and their primitive vectors. For contrast, 
panel (b) shows the 2D hexagonal structure which forms a Bravais lattice with a single-point primitive cell. 


Now we are ready for the following generalization of the 1D Bloch theorem, given by Eqs. 
(2.193) and (2.210), to higher dimensions: any eigenfunction of the Schrédinger equation describing 
particle’s motion in the spatially-unlimited periodic potential (105) may be represented either as 

y(rt+R)=y(e ®, (3.107) 
or as 


y(r)=u(r)e4', with u(r +R) =u(r), (3.108) 


where the quasimomentum /fiq is again a constant of motion, but now it is a vector. The key notion of the 
band theory in d dimensions is the reciprocal lattice in the wave-vector (q) space, formed as 


(3.109) 


with integer /;,, and vectors b; selected in such a way that the following natural generalization of Eq. 
(104) is valid for any pair of points of the direct and reciprocal lattices: 


46 This diamond structure may be best understood as an overlap of two fcc lattices of side a, mutually shifted by 
the vector {1, 1, 1}xa/4, so that the distances between each point of the combined lattice and its 4 nearest 
neighbors (see the solid gray lines in Fig. 12c) are all equal. 

47 A harder case is presented by so-called quasicrystals (whose idea may be traced down to medieval Islamic 
tilings, but was discovered in natural crystals, by D. Shechtman et al., only in 1984), which obey a high (say, the 
5-fold) rotational symmetry, but cannot be described by a Bravais lattice with any finite primitive unit cell. For a 
popular review of quasicrystals see, for example, P. Stephens and A. Goldman, Sci. Amer. 264, #4, 24 (1991). 
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eiQR _1, (3.110) 


One way to describe the physical sense of the lattice Q is to say that according to Eqs. (80) 
and/or (86), it gives the set of the vectors q = k — k; for that the interference of the waves scattered by all 
Bravais lattice points is constructive, and hence strongly enhanced.*8 Another way to look at the 
reciprocal lattice follows from the first formulation of the Bloch theorem, given by Eq. (107): if we add 
to the quasimomentum q of a particle any vector Q of the reciprocal lattice, the wavefunction does not 
change. This means, in particular, that all information about the system’s eigenfunctions is contained in 
just one elementary cell of the reciprocal space q. Its most frequent choice, called the /” Brillouin zone, 
is the set of all points q that are closer to the origin than to any other point of the lattice Q. (Evidently, 
the 1 Brillouin zone in one dimension, discussed in Sec. 2.7, falls under this definition — see, e.g., Figs. 
2.26 and 2.28.) 


It is easy to see that the primitive vectors b; of the reciprocal lattice may be constructed as 


a, xa, 


(3.111) 


a, (a, xa;)’ 


Indeed, from the “operand rotation rule” of the vector algebra‘? it is evident that a;-b; = 276;. Hence, 
with the account of Eq. (109), the exponent on the left-hand side of Eq. (110) is reduced to 

el x expfami(l.r, +17, +473). (3.112) 
Since all J; and all 7 are integers, the expression in the parentheses is also an integer, so that the 
exponent indeed equals 1, thus satisfying the definition of the reciprocal lattice given by Eq. (110). 


As the simplest example, let us return to the simple cubic lattice of a period a (Fig. 11a), oriented 
in space so that 


a,=an,, a,=an,, a, =an.,, (3.113) 

According to Eq. (111), its reciprocal lattice is also cubic: 
Q= 77am, +11, +Ln.), (3.114) 

r 


so that the 1“ Brillouin zone is a cube with the side b = 27a. 


Almost equally simple calculations show that the reciprocal lattice of fec is bec, and vice versa. 
Figure 13 shows the resulting 1 Brillouin zone of the fcc lattice. 


The notion of the reciprocal lattice makes the multi-dimensional band theory not much more 
complex than that in 1D, especially for numerical calculations, at least for the single-point Bravais 
lattices. Indeed, repeating all the steps that have led us to Eq. (2.218), but now with a d-dimensional 
Fourier expansion of the functions U(r) and u(r), we readily get its generalization: 


48 This is why the notion of the Q-lattice is also the main starting point of X-ray diffraction studies of crystals. 
Indeed, it allows rewriting the well-known Bragg condition for diffraction peaks in an extremely simple form: k = 
k; + Q, where k; and k are the wave vectors of the, respectively, incident and diffracted waves — see, e.g., EM Sec. 
8.4 (where it was more convenient for me to use the notation Ko for k; ). 

49 See, e.g., MA Eq. (7.6). 
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Uy = (E- Ey, (3.115) 
I'l 
where I is now a d-dimensional vector of integer indices J. The summation in Eq. (115) should be 
carried over all essential components of this vector (i.e. over all relevant nodes of the reciprocal lattice), 
so that writing a corresponding computer code requires a bit more care than in 1D. However, this is just 
a homogeneous system of linear equations, and numerous routines of finding its eigenvalues F are 
readily available from both public sources and commercial software packages. 


Fig. 3.13. The 1“ Brillouin zone of the fcc 
lattice, and the traditional notation of its 
main directions. Adapted from 
http://en. wikipedia.org/wiki/Band_structure, 
as a public domain material. 


What is indeed more complex than in 1D is representation (and hence comprehension :-), of the 
calculated results and experimental data. Typically, the representation is limited to plotting the Bloch 
state eigenenergy as a function of components of the vector g along certain special directions the 
reciprocal space of quasimomentum (see, e.g., the red lines in Fig. 13), typically on a single panel. Fig. 
14 shows perhaps the most famous (and certainly the most practically important) of such plots, the band 
structure of electrons in crystalline silicon. The dashed horizontal lines mark the so-called indirect gap 
of the width ~1.12 eV between the “valence” (nominally occupied) and the next “conduction” 
(nominally unoccupied) energy bands. 


E [ev] 


Fig. 3.14. The band structure of silicon, plotted along 
the special directions shown in Fig. 13. (Adapted from 
http://www.tf.uni-kiel.de/matwis/amat/semi_en/.) 


a 
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In order to understand the reason for such complexity, let us see how would we start to calculate 
such a picture in the weak-potential approximation, for the simplest case of a 2D square lattice — which 
is a subset of the cubic lattice (106), with z; = 0. Its 1‘ Brillouin zone is of course also a square, of the 
area (2a/a)° — see the dashed lines in Fig. 15. Let us draw the lines of the constant energy of a free 
particle (U = 0) in this zone. Repeating the arguments of Sec. 2.7 (see especially Fig. 2.28 and its 
discussion), we may conclude that Eq. (2.216) should be now generalized as follows, 


242 2 ant \ 2n1. - 
B= = [a.- 7) “[a- =| (3.116) 


with all possible integers /, and /,. Considering this result only within the 1“ Brillouin zone, we see that 
as the particle’s energy E grows, the lines of equal energy, for the lowest energy band, evolve as shown 
in Fig. 15. Just like in 1D, the weak-potential effects are only important at the Brillouin zone 
boundaries, and may be crudely considered as the appearance of narrow energy gaps, but one can see 
that the band structure in q-space is complex enough even without these effects — and becomes even 
more involved at higher E. 


(b) (c) 


Fig. 3.15. The lines of constant 
energy F of a free particle, within 
the 1“ Brillouin zone of a square 
Bravais lattice, for: (a) E/E; ~ 0.95, 
(b) E/E, = 1.05; and (c) E/E, = 2.05, 
where E; = 7°f’/2ma’. 


2n/a 


The tight-binding approximation is usually easier to follow. For example, for the same square 2D 
lattice, we may repeat the arguments that have led us to Eq. (2.203), to write °° 


thay, =—5, (a9 +449 +4501 + 25,4)s (3.117) 


where the indices correspond to the deviations of the integers 7, and 7, from an arbitrarily selected 
minimum of the potential energy — and hence of the wavefunction’s “hump”, quasi-localized at this 
minimum. Now, looking for the stationary solution of these equations, that would obey the Bloch 
theorem (107), instead of Eq. (2.206) we get 


E=E,+6, =E, -—6, Ga te 44 et, ee) =f 20, (cos q,a + COS q,a) (3.118) 


Figure 16 shows this result, within the 1“ Brillouin zone, in two forms: as color-coded lines of 
equal energy, and as a 3D plot (also enhanced by color). It is evident that the plots of this function along 
different lines on the q-plane, for example along one of the axes (say, g,) and along a diagonal of the 1* 
Brillouin zone (say, with g, = g,) give different curves E(q), qualitatively similar to those of silicon (Fig. 


50 Actually, using the same values of 6, in both directions (x and y) implies some sort of symmetry of the quasi- 
localized states. For example, the s-states of axially-symmetric potentials (see the next section) always have such 
symmetry. 
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14). However, the latter structure is further complicated by the fact that the primitive cell of its Bravais 
lattice contains 2 atoms — see Fig. 12c and its discussion. In this case, even the tight-binding picture 
becomes more complex. Indeed, even if the atoms at different positions of the primitive unit cell are 
similar (as they are, for example, in both graphene and silicon), and hence the potential wells near those 
points and the corresponding local wavefunctions u(r) are similar as well, the Bloch theorem (which 
only pertains to Bravais lattices!) does not forbid them to have different complex probability amplitudes 
a(t) whose time evolution should be described by a specific differential equation. 


q, 


qs 7 
Fig. 3.16. The allowed band 
energy ¢,=E—E,, for a square 
2D lattice, in the tight-binding 
approximation. 


2n/a 


As the simplest example, to describe the honeycomb lattice shown in Fig. 12a, we have to 
prescribe different probability amplitudes to the “top” and “bottom” points of its primitive cell — say, a 
and 2, correspondingly. Since each of these points is surrounded (and hence weakly interacts) with three 
neighbors of the opposite type, instead of Eq. (117) we have to write two equations: 


3 3 


ina=-6,) 8, ihB=-6,>°a,, (3.119) 


j=l = 


where each summation is over three next-neighbor points. (In these two sums, I am using different 
summation indices just to emphasize that these directions are different for the “top” and “bottom” points 
of the primitive cell — see Fig. 12a.) Now using the Bloch theorem (107) in the form similar to Eq. 
(2.205), we get two coupled systems of linear algebraic equations: 


3 . 3 “oe! 
(E-E,)a=-6,BY.e"",  (E-E,)B=-6,a ce", (3.120) 
j=l j=l 


where r; and r’;, are the next-neighbor positions, as seen from the top and bottom points, respectively. 
Writing the condition of consistency of this system of homogeneous linear equations, we get two equal 
and opposite values for energy correction for each value of q: 


3 id: F 4 
E,=E,+6,5"?, — where == sett ) (3.121) 


jof'=l 


According to Eq. (120), these two energy bands correspond to the phase shifts (on the top of the regular 
Bloch shift q-Ar) of either 0 or z between the adjacent quasi-localized wavefunctions u(r ). 
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The most interesting corollary of such energy symmetry, augmented by the honeycomb lattice’s 
symmetry, is that for certain values qp of the vector q (that turn out to be in each of six corners of the 
honeycomb-shaped 1* Brillouin zone), the double sum = vanishes, i.e. the two band surfaces F:(q) 
touch each other. As a result, in the vicinities of these so-called Dirac points,>! the dispersion relation is 
linear: 


E.. EB thv 


a=qp ,q; where q=q-q,, (3.122) 
with v, o 6, being a constant with the dimension of velocity — for graphene, close to 10° m/s. Such a 
linear dispersion relation ensures several interesting transport properties of graphene, in particular of the 
quantum Hall effect in it — as was already mentioned in Sec. 2. For their more detailed discussion, I have 


to refer the reader to special literature.>2 


3.5. Axially-symmetric systems 


I cannot conclude this chapter (and hence our review of wave mechanics) without addressing the 
exact solutions of the stationary Schrédinger equation>? possible in the cases of highly symmetric 
functions U(r). Such solutions are very important, in particular, for atomic and nuclear physics, and will 
be used in the later chapters of this course. 


In some rare cases, such symmetries may be exploited by the separation of variables in Cartesian 
coordinates. The most famous (and rather important) example is the d-dimensional harmonic oscillator — 


a particle moving inside the potential 
ma, ~ 
US a ham (3.123) 


j=l 


Separating the variables exactly as we did in Sec. 1.7 for the rectangular hard-wall box (1.77), for each 
degree of freedom we get the Schrédinger equation (2.261) of a 1D oscillator, whose eigenfunctions are 


5! This term is based on a (rather indirect) analogy with the Dirac theory of relativistic quantum mechanics, to be 
discussed in Chapter 9 below. 

52 See, e.g., the reviews by A. Castro Neto et al., Rev. Mod. Phys. 81, 109 (2009) and by X. Lu et al., Appl. Phys. 
Rev. 4, 021306 (2017). Note that the transport properties of graphene are determined by coupling of 2p-state 
electrons of its carbon atoms (see Secs. 6 and 7 below), whose wavefunctions are proportional to exp {+i} rather 
than are axially-symmetric as implied by Eqs. (120). However, due to the lattice symmetry, this fact does not 
affect the above dispersion relation E(q). 

53 This is my best chance to mention, in passing, that the eigenfunctions y,(r) of any such problem do not feature 
the instabilities typical for the deterministic chaos effects of classical mechanics — see, e.g., CM Chapter 9. (This 
is why the term quantum mechanics of classically chaotic systems is preferable to the occasionally used term 
“quantum chaos”.) It is curious that at the initial stages of the time evolution of the wavefunctions of such 
systems, their certain correlation functions still grow exponentially, reminding the Lyapunov exponents A of their 
classical chaotic dynamics. This growth stops at the so-call Ehrefect times t, ~ 2''In(S/h), where S is the action 
scale of the problem — see, e.g., I. Aleiner and A. Larkin, Phys. Rev. FE 55, R1243 (1997). In a stationary quantum 
state, the most essential trace of the classical chaos in a system is an unusual statistics of its eigenvalues, in 
particular of the energy spectra. We will have a chance for a brief look at such statistics in Chapter 5, but 
unfortunately, I will not have time/space to discuss this field in much detail. Perhaps the best available book for 
further reading is the monograph by M. Gutzwiller, Chaos in Classical and Quantum Mechanics, Springer, 1991. 
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given by Eq. (2.284), and the energy spectrum is described by Eq. (2.162). As a result, the total energy 
spectrum may be indexed by a vector n = {7, n2,..., ng} of d independent integer quantum numbers 7;: 


d 
ee =| Son 4) (3.124) 
j=l 


each ranging from 0 to «. Note that every energy level of this system, with the only exception of its 
ground state, 


d d 
1 1 2 
=] 1y,(",)=— a ex - eat (3.125) 
Ws I] ON gaye? P| On? 2 J | 
is degenerate: several different wavefunctions, each with its own different set of quantum numbers n;, 
but the same value of their sum, have the same energy. 


However, the harmonic oscillator problem is an exception: for other central- and spherically- 
symmetric problems the solution is made easier by using more appropriate curvilinear coordinates. Let 
us start with the simplest axially-symmetric problem: the so-called planar rigid rotator (or “rotor’’), i.e. 
a particle of mass s,>4 constrained to move along a plane circle of radius R (Fig. 17).°° 


Fig. 3.17. A planar rigid rotator. 


The classical planar rotator may be described by just one degree of freedom, say the angle 
displacement @ (or equivalently the arc displacement / = R@) from some reference direction, with the 
energy (and the Hamiltonian function) H = p’/2.», where p = wv = wM (dl/dt), Ny being the unit vector 
in the azimuthal direction — see Fig. 17. This function is similar to that of a free 1D particle (with the 
replacement x — / = R@), and hence the rotator’s quantum properties may be described by a similar 
Hamiltonian operator: 


a2 
[ee a ee ee ee (3.126) 
2m "ol R * Op 


whose eigenfunctions have a similar structure: 


w = Cell! = Cele (3.127) 


54 From this point on (until the chapter’s end), I will use this exotic font for the particle’s mass, to avoid any 
chance of its confusion with the impending “magnetic” quantum number m, traditionally used in axially- 
symmetric problems. 

55 This is a reasonable model for the confinement of light atoms, notably hydrogen, in some organic compounds, 
but I am addressing this system mostly as the basis for the following, more complex problems. 
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The “only” new feature is that in the rotator, all observables should be 27-periodic functions of 
the angle g. Hence, as we have already discussed in the context of the magnetic flux quantization (see 
Fig. 4 and its discussion), as the particle makes one turn around the central point 0, its wavefunction’s 
phase AR@ may only change by 27m, with an arbitrary integer m (ranging from —co to +00): 


WV, (p+ 2m) =y,, (perm. (3.128) 


With the eigenfunctions (127), this periodicity condition immediately gives 2zkR = 22m. Thus, the wave 
number k can take only quantized values k,, = m/R, so that the eigenfunctions should be indexed by this 
magnetic quantum number m: 


Wn =Cy expim =| =C,, exptimg}, (3.129) 


and the energy spectrum is discrete: 


aie. 


= Se 3.130 
”" 2m lm  22mR? ( ) 


This simple model allows exact analysis of the external magnetic field effects on a confined 
motion of an electrically charged particle. Indeed, in the simplest case when this field is axially 
symmetric (or just uniform) and directed normally to the rotator’s plane, it does not violate the axial 
symmetry of the system. According to Eq. (26), in this case, we have to generalize Eq. (126) as 


e 4 a | ho ; 
H= iin Al = i—n Al. 3.131 
2m [ . ol : ) 2m R r Op _ ( ) 


Here, in contrast to the Cartesian gauge choice (44), which was so instrumental for the solution of the 
Landau level problem, it is beneficial to take the vector potential in the axially-symmetric form A = 
A(p)Ng, where p = {x, y} is the 2D radius-vector, with the magnitude p = (x° + y’)!?. Using the well- 
known expression for the curl operator in the cylindrical coordinates,5® we can readily check that the 
requirement VxA = 4%n_., with B= const, is satisfied by the following function: 


Bo 


A=n, 5 (3.132) 
For the planar rotator, p = R = const, so that the stationary Schrédinger equation becomes 
: 2 
1 ho BR 
-i——- =E y,,- 3.133 
2m R 0: 2 Vn nn ( ) 


A little bit surprisingly, this equation is still satisfied with the eigenfunctions (127). Moreover, 
since the periodicity condition (128) is also unaffected by the applied magnetic field, we return to the 
periodic eigenfunctions (129), independent of %. However, the field does affect the system’s 
eigenenergies: 


56 See, e.g., MA Eq. (10.5). 


Chapter 3 Page 32 of 64 


Planar rotator: 
eigenfunctions 


Planar rotator: 
eigenenergies 


Planar rotator: 
magnetic 
field’s effect 


Essential Graduate Physics QM: Quantum Mechanics 


E 


m 


(3.134) 


I (ite 28) : my in 


yw dm| Rép 1 2 2 


where ® = zR’# is the magnetic flux through the area limited by the particle’s trajectory, and Do’ = 


2ah/q is the “normal” magnetic flux quantum we have already met in the AB effect’s context — see Eq. 
(34) and its discussion. The field also changes the electric current of the particle in each eigenstate: 


h «(0 igRB h 2 ® 
I= —- —¢.c. |= C —-— |. 3.135 
m 4 2imR (2 2h \ | ¢ mR | n| [m < ( ) 


Normalizing the wavefunction (129) to have W,, = 1, we get |Cn |? = 1/22, so that Eq. (135) becomes 


® : hq 
I= —-— |], ith J, =——_... 3.136 
" [m 0) ) ° “ ° 2amR? ( ) 


0 
The functions E,,(®) and J, (®) are shown in Fig. 18. Note that since Mp’ « 1/q, for any sign of 
the particle’s charge q, dI,/d® < 0. It is easy to verify that this means that the current is diamagnetic for 
any sign of g:5’ the field-induced current flows in the direction that its own magnetic field tries to 
compensate for the external magnetic flux applied to the loop. This result may be interpreted as a 
different manifestation of the AB effect.°* In contrast to the interference experiment that was discussed 
in Sec. 1, in the situation shown in Fig. 17 the particle is not absorbed by the detector but travels around 
the ring continuously. As a result, its wavefunction is “rigid”: due to the periodicity condition (128), the 
quantum number m is discrete, and the applied magnetic field cannot change the wavefunction 
gradually. In this sense, the system is similar to a superconducting loop — see Fig. 4 and its discussion. 
The difference between these systems is two-fold: 


Fig. 3.18. The magnetic field effect on a 
charged planar rotator. Dashed arrows show 
possible inelastic transitions between 
metastable and ground states, due to weak 
interaction with the environment, as the 
external magnetic field is slowly increased. 


57 This effect, whose qualitative features remain the same for all 2D or 3D localized states (see Chapter 6 below), 
is frequently referred to as orbital diamagnetism. In magnetic materials consisting of particles with 
uncompensated spins, this effect competes with an opposite effect, spin paramagnetism — see, e.g., EM Sec. 5.5. 
58 It is straightforward to check that the final forms of Eqs. (134)-(136) remain valid even if the magnetic field is 
localized well inside the rotator’s circumference so that its lines do not touch the particle’s trajectory. 
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(1) For a single charged particle, in macroscopic systems with practicable values of g, R, and m, 
the scale Jo of the induced current is very small. For example, for 4 = me, g = —e, and R = 1 um, Eq. 
(136) yields Jy ~ 3 pA.°? With the ring’s inductance ¥ of the order of oR, the contribution ©; = ¥ ~ 
LRIy ~ 107* Wb of such a small current into the net magnetic flux ® is negligible in comparison with 


@)’ ~ 10°’ Wb, so that the wavefunction quantization does not lead to the constancy of the total 
magnetic flux. 


(ii) As soon as the magnetic field raises the eigenstate energy E,,, above that of another eigenstate 
En’, the former state becomes metastable, and a weak interaction of the system with its environment 
(which is neglected in our simple model, but will be discussed in Chapter 7) may induce a quantum 
transition of the system to the lower-energy state, thus reducing the diamagnetic current’s magnitude — 
see the dashed lines in Fig. 18. The flux quantization in superconductors is much more robust to such 
perturbations.°®! 


Now let us return, once again, to the key Eq. (129), and see what does it give for one more 
important observable, the particle’s angular momentum 


L=rxp, (3.137) 
In this particular geometry, the vector L has just one component, normal to the rotator plane: 
L,=Rp. (3.138) 


In classical mechanics, L, of the rotator should be conserved (due to the absence of external torque), but 
it may take arbitrary values. In quantum mechanics, the situation changes: with p = hk, our result k,, = 
m/R for the m" eigenstate may be rewritten as 


(L.),, = Rhk,, =hm. (3.139) 


Thus, the angular momentum is quantized: it may be only a multiple of the Planck constant hi — 
confirming the N. Bohr’s guess — see Eq. (1.8). As we will see in Chapter 5, this result is very general 
(though it may be modified by spin effects), and the wavefunctions (129) may be interpreted as 
eigenfunctions of the angular momentum operator. 


Let us see whether this quantization persists in more general, but still axial-symmetric systems. 
To implement the planar rotator in our 3D world, we needed to provide rigid confinement of the particle 
both in the motion plane and along the 2D radius p. Let us consider a more general situation when only 
the former confinement is strict, i.e. to the case when a 2D particle moves in an arbitrary centrally- 
symmetric potential 


U(p) =U(p). (3.140) 


59 Such weak persistent, macroscopic diamagnetic currents in non-superconducting systems have been 
experimentally observed by measuring the weak magnetic field induced by the currents, in systems of a large 
number (~10’) of similar conducting rings — see, e.g., L. Lévy et al., Phys. Rev. Lett. 64, 2074 (1990). Due to the 
dephasing effects of electron scattering by phonons and other electrons (unaccounted for in our simple theory), 
the effect’s observation requires submicron rings and millikelvin temperatures. 

60 See, e.g., EM Sec. 5.3. 

6! Interrupting a superconducting ring with a weak link (Josephson junction), i.e. forming a SQUID, we may get a 
switching behavior similar to that shown with dashed arrows in Fig. 18 — see, e.g., EM Sec. 6.5. 
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Using the well-known expression for the 2D Laplace operator in polar coordinates,°* we may represent 
the 2D stationary Schrédinger equation in the form 


h11 0 é 1 0 
+ +U =EKy. 3.141 
map 2p 2) 7 ol (p)y =Ey ( ) 


Separating the radial and angular variables as®? 


y =K(p)7(P), (3.142) 


we get, after the division of all terms by y and their multiplication by p’, the following equation: 


h?|pd( d® hee 
2m|Rdp\ dp) #dqy 


fret =pE. (3.143) 


The fraction (d’#dqg’)/Z should be a constant (because all other terms of the equation may be functions 
only of ), so that for the function @) we get an ordinary differential equation, 
dF 


2 


+v2#=0, (3.144) 


where V is the variable separation constant. The fundamental solutions of Eq. (144) are evidently 7 « 
exp {+i vg}. Now requiring, as we did for the planar rotator, the 27 periodicity of any observable, 1.e. 


A(p +22) = Hp)er™™, (3.145) 


where m is an integer, we see that the constant v has to be equal to m, and get, for the angular factor, the 
same result as for the full wavefunction of the planar rotator — cf. Eq. (129): 


#,=C,e"", with m=0,4+1,+2.... (3.146) 


Plugging the resulting relation (d’Adgy’)/Z =-m’ back into Eq. (143), we may rewrite it as 


h?| 1 d( d®\ m 
+U(p)=E. 3.147 


The physical interpretation of this equation is that the full energy is a sum, 


E=E,+E,, (3.148) 
of the radial-motion part 


Pp 


h? 1d( d& 
E = +U(p). 3.149 
er [> | (p) ( ) 


and the angular-motion part 


62 See, e.g., MA Eq. (10.3) with 0/dz = 0. 

63 At this stage, I do not want to mark the particular solution (eigenfunction) y and corresponding eigenenergy E 
with any single index, because based on our experience in Sec. 1.7, we already may expect that in a 2D problem 
the role of this index will be played by two integers — two quantum numbers. 


Chapter 3 Page 35 of 64 


Essential Graduate Physics QM: Quantum Mechanics 


22 
hom 


E = ; 
es 2mp” 


(3.150) 


Now let us recall that a similar separation exists in classical mechanics, because the total 
energy of a particle moving in a central field may be represented as 


mM M . : 
B=“ y? +U(p) =p" + p°o")+U(p) = E, + Ey. (3.151) 
2 2 2 
. Dp m ; Pp L 
with FE. =" +U(p), and E. = 2G SS 3.152 
‘s am (P) uf 2. i ? am 2mp- ( ) 


The comparison of the latter relation with Eqs. (139) and (150) gives us grounds to expect that the 


quantization rule L, = mh may be valid not only for this 2D problem but in 3D cases as well. In Sec. 5.6, 
we will see that this is indeed the case. 


Returning to Eq. (147), with our 1D wave mechanics experience we may expect that any fixed m 
this ordinary, linear, second-order differential equation should have (for a motion confined to a certain 
final region of its argument p) a discrete energy spectrum described by another integer quantum number 
— say, n. This means that the eigenfunctions (142) and corresponding eigenenergies (148) and <X(p) 
should be indexed by two quantum numbers, m and n. So, the variable separation is not so “clean” as it 
was for the rectangular potential well. Normalizing the angular function 7 to the full circle, Ag= 22, we 
may rewrite Eq. (142) as 


1 im 
Yinn =®nn(P)On(P) = Gaye sm (p)e”®. (3.153) 


A good (and important) example of an analytically solvable problem of this type is a 2D particle 
whose motion is rigidly confined to a disk of radius R, but otherwise free: 


0, for0<p<R, 
U(p)= (3.154) 


+o, forR<p. 


In this case, the solutions &,,(e) of Eq. (147) are proportional to the first-order Bessel functions 
Jn(knp),©> with the spectrum of possible values k, following from the boundary condition &,,,(R) = 0. 
Let me leave a detailed analysis of this problem for the reader’s exercise. 


3.6. Spherically-symmetric systems: Brute force approach 


Now let us proceed to the mathematically more involved, but practically even more important 
case of the 3D motion, in a spherically-symmetric potential 


U(r) =U(r). (3.155) 


64 See, e.g., CM Sec. 3.5. 
65 A short summary of properties of these functions, including the most important plots and a useful table of 
values, may be found in EM Sec. 2.7. 
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Let us start, again, with solving the eigenproblem for a rigid rotator — now a spherical rotator, 
i.e. a particle confined to move on the spherical surface of radius R. The rotator has two degrees of 
freedom because its position on the surface is completely described by two coordinates — say, the polar 
angle @and the azimuthal angle g. In this case, the kinetic energy we need to consider is limited to its 
angular part, so that in the Laplace operator in spherical coordinates®® we may keep only those parts, 
with fixed r = R. Because of this, the stationary Schrédinger equation becomes 


[am in I 
sin @ + =f 3 3.156 
2mR? ee al sin’ 0 09° : i 


(Again, we will attach indices to yand F ina minute.) With the natural variable separation, 

y =O()7(¢), (3.157) 
Eq. (156), with all terms multiplied by sin°@/@Z, yields 
he [ane ZT 2) jhaee 


-——, sin@— |+—=-, 
2mR © dé dO id ) 


|-2sino. (3.158) 


Just as in Eq. (143), the fraction (d’ Aldx’)/F may be a function of g only, and hence has to be constant, 


giving Eq. (144) for it. So, with the same periodicity condition (145), the azimuthal functions are 
expressed by (146) again; in the normalized form, 


1 : 
4,9) =—~ael”®. (3.159) 
(27) 
With that, the fraction (d’Adq@’)/Z in Eq. (158) equals (-m’), and after the multiplication of all terms of 
that equation by @/sin’@, it is reduced to the following ordinary linear differential equation for the polar 
eigenfunctions ©( 60): 


2 2 
se {sino 2 | "_@=.0, with ea e/ (3.160) 
sin@ dé dé) sin“ @ 2mR 
It is common to recast it into an equation for a new function P(é) = ©(@), with €=cos 0: 
2 
GN ee?) elite) = eG, (3.161) 
dé dé 1-é 


where a new notation for the normalized energy is introduced: /(/+1) = ¢. The motivation for such 
notation is that, according to the mathematical analysis of Eq. (161) with integer m,°’ it has solutions 
only if the parameter / is an integer: / = 0, 1, 2,..., and only if that integer is not smaller than |ml, 1.e. if 


~1<m<4l. (3.162) 


This fact immediately gives the following spectrum of the spherical rotator’s energy E — and, as we will 
see later, the angular part of the energy of any spherically-symmetric system: 


66 See, e.g., MA Eq. (10.9). 

67 This analysis was first carried out by A.-M. Legendre (1752-1833). Just as a historic note: besides many 
original mathematical achievements, Dr. Legendre had authored a famous textbook, Eléments de Géométrie, 
which dominated teaching geometry through the 19th century. 
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Angular 
(3.163) energy 


spectrum 


so that the only effect of the magnetic quantum number m here is imposing the restriction (162) on the 
non-negative integer / — the so-called orbital quantum number. This means, in particular, that each 
energy (163) corresponds to (2/ + 1) different values of m, i.e. is (2/ + 1}-degenerate. 


To understand the nature of this degeneracy, we need to explore the corresponding 
eigenfunctions of Eq. (161). They are naturally numbered by two integers, m and /, and are called the 
associated Legendre functions P;". (Note that here m is an upper index, not a power!) For the particular, 
simplest case m = 0, these functions are the so-called Legendre polynomials P<) = P\'(é, which may 


be defined as the solutions of the following Legendre equation, resulting from Eq. (161) at m = 0: 


d 


Legendre 


( oe) Plat +1)P =, (3.164) equation 


dg 
but also may be calculated explicitly from the following Rodrigues formula:®8 
1d 
ide 


(=), £260 255, 16s) Sere 


polynomials 


tC 


Using this formula, it easy to spell out a few lowest Legendre polynomials: 


RE=1 RO=6& PE=;bE-1) AO= 768-3}... (3.166) 


though such explicit expressions become bulkier and bulkier as / is increased. As these expressions (and 
Fig. 19) show, as the argument ¢ is increased, all these functions end up at the same point, P(+1) =+ 1, 
while starting at either at the same point or at the opposite point: P-1) = (-1)'. On the way between 
these two end points, the /" polynomial crosses the horizontal axis exactly / times, i.e. Eq. (164) has / 
roots.°? 


ee 


0.5 


P(g) 0 
CNX 


—1 - 0.5 0 0.5 I 
& =cos0 


Fig. 3.19. A few lowest Legendre polynomials. 


68 This wonderful formula may be readily proved by plugging it into Eq. (164), but was not so easy to discover! 
This was done (independently) by B. O. Rodrigues in 1816, J. Ivory in 1824, and C. Jacobi in 1827. 

69 In this behavior, we may readily recognize the “standing wave” pattern typical for all 1D eigenproblems — cf. 
Figs. 1.8 and 2.35, as well as the discussion of the Sturm oscillation theorem at the end of Sec. 2.9. 
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It is also easy to use the Rodrigues formula (165) and the integration by parts to show that on the 
segment —1 < € < +1, the Lagrange polynomials form a full orthogonal set of functions, with the 
following normalization rule: 


[ROR Eds = 5, (3.167) 


For m > 0, the associated Legendre functions (now not necessarily polynomials!), may be 
expressed via the Legendre polynomials (165) using the following formula:7° 


Associated 7 FS re n° 
Legendre P(g) = ("0-6") PS), (3.168) 
functions dé” 
while the functions with a negative magnetic quantum number may be found as 
—m)! 
P(e =p" ™ p(®, for m>0. (3.169) 
(J+m)! 


On the segment —1 < & < +1, the associated Legendre functions with a fixed index m form a full 
orthogonal set, with the normalization relation, 


+1 l ! 
[rr@rnodg= Os, G.170) 


which is evidently a generalization of Eq. (167) for arbitrary m. 

Since the difference between the angles @ and @ is to large extent artificial (due to an arbitrary 
direction of the polar axis), physicists prefer to use not the functions @(@ « P” (cos) and %,(g) « e””” 
separately, but normalized products of the type (157), which are called the spherical harmonics: 


Spherical 


harmonics ih (A, 2) = c + 1) ( _ m)! 


4n (l+m)! 


1/2 
P,"(cosd)e"?. (3.171) 


The specific front factor in Eq. (171) is chosen in a way to simplify the following two expressions: the 
relation of the spherical harmonics with opposite signs of the magnetic quantum number, 


¥,"(0,9)=(-)"[V"@.0)' (3.172) 


and the following normalization relation: 
$Y," (O.p)1Y." (0,9) dQ = by Spy (3.173) 
4n 


with the integration over the whole solid angle. The last formula shows that on a spherical surface, the 
spherical harmonics form an orthonormal set of functions. This set is also full, so that any function 
defined on the surface, may be uniquely represented as a linear combination of Y/". 


Despite a somewhat intimidating character of the formulas given above, they yield quite simple 
expressions for the lowest spherical harmonics, which are most important for applications: 


70 Note that some texts use different choices for the front factor (called the Condon-Shortley phase) in the 
functions P/”, which do not affect the final results for the spherical harmonics Y;”. 
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[20:, ¥/ S/4any", (3.174) 
Y,' =(/8z)'” sinde'?, 

l=1: 3 Y° =(3/4z)'” cos, (3.175) 
Y! =-3/8z)'? sinde’®, 


Y,;? =-(15/32z)'” sin? @e~ ne 
ie (15/8)"* sinOcosde'”, 
1=2: 4 y¥° =(3/16z)!? 3cos? 6-1), etc. (3.176) 
CS (15/87) sinOcosGe'”, 
Ye (15/322) sin’ Ge? 
It is important to understand the general structure and symmetry of these functions. Since the 


sherical harmonics with m # 0 are complex, the most popular way of their graphical representation is to 
normalize their real and imaginary parts as?! 


y = J2(-1)" x Inly,”' xsinmg, form<0, (3.177) 
ss Rely,” xcosmp, form>0, 


(for m = 0, Yo = Y;’), and then plot the magnitude of these real functions in the spherical coordinates as 
the distance from the origin, while using two colors to show their sign — see Fig. 20. 


Let us start from the simplest case / = 0. According to Eq. (162), for this lowest orbital quantum 
number, there may be only one magnetic quantum number, m = 0. According to Eq. (174), the spherical 
harmonic corresponding to that state is just a constant, so that the wavefunction of this so-called s state” 
is uniformly distributed over the sphere. Since this function has no gradient in any angular direction, it is 
only natural that the angular kinetic energy (163) of the particle equals zero. 


According to the same Eq. (162), for / = 1, there are 3 different p states, with m =—1, m = 0, and 
m = +1 — see Eq. (175). As the second row of Fig. 20 shows, these states are essentially identical in 
structure and are just differently oriented in space, thus readily explaining the 3-fold degeneracy of the 
kinetic energy (163). Such a simple explanation, however, is not valid for the 5 different d states (1 = 2), 
shown in the third row of Fig. 20, as well as the states with higher /: despite their equal energies, they 
differ not only by their spatial orientation but their structure as well. All states with m = 0 have a 
nonzero gradient only in the @ direction. On the contrary, the states with the ultimate values of m (+)), 
change only monotonically (as sin’@) in the polar direction, while oscillating in the azimuthal direction. 
The states with intermediate values of m provide a gradual transition between these two extremes, 
oscillating in both directions, stronger and stronger in the azimuthal direction as |m| is increased. Still, 
the magnetic quantum number, surprisingly, does not affect the angular energy for any /. 


71 Such real functions Y;,,, which also form a full orthonormal set, and are frequently called the real (or “tesseral’’) 
spherical harmonics, are more convenient than the complex harmonics Y;”" for several applications, especially 
when the variables of interest are real by definition. 

72 The letter names for the states with various values of / stem from the history of optical spectroscopy — for 
example, the letter “s” used for states with / = 0, originally denoted the “sharp” optical line series, etc. The 
sequence of the letters is as follows: s, p, d, f; g, and then continuing in alphabetical order. 
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1=0 
(s state) 


l=] 
(p states) 


1=2 
(d states) 


(f states) 


Fig. 3.20. Radial plots of several lowest real spherical harmonics Y;,,. (Adapted from 
https://en.wikipedia.org/wiki/Spherical_harmonics under the CC BY-SA 3.0 license.) 


Another counter-intuitive feature of the spherical harmonics follows from the comparison of Eq. 
(163) with the second of the classical relations (152). These expressions coincide if we interpret the 
constant 


LP =ni(i+)), (3.178) 


as the value of the full angular momentum squared, L* =| LP (including its both @and y components) in 
the eigenstate with eigenfunction Y/”. On the other hand, the structure (159) of the azimuthal component 
“¢) of the wavefunction is exactly the same as in 2D axially-symmetric problems, implying that Eq. 
(139) still gives correct values L, = mh for the z-component of the angular momentum. This fact invites 
a question: why for any state with / > 0, (L.)” = mf’ < Pi’ is always less than L” = (/ + 1)h’? In other 
words, what prevents the angular momentum vector to be fully aligned with the axis z? 


Besides the difficulty of answering this question using the above formulas, this analysis (though 
mathematically complete), is as intellectually unsatisfactory as the harmonic oscillator analysis in Sec. 
2.9. In particular, it does not explain the meaning of the extremely simple relations for the eigenvalues 
of the energy and the angular momentum, coexisting with rather complicated eigenfunctions. 


We will obtain natural answers to all these questions and concerns in Sec. 5.6 below, and now 
proceed to the extension of our wave-mechanical analysis to the 3D motion in an arbitrary spherically- 
symmetric potential (155). In this case, we have to use the full form of the Laplace operator in spherical 
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coordinates.’? The variable separation procedure is an evident generalization of what we have done 
before, with the particular solutions of the type 


VY =K(p)/O(AZ(Q), (3.179) 


whose substitution into the stationary Schrédinger equation yields 


2 2 
a 5 Le G ms : a [sino : ig is +U(r)=E. (3.180) 
2mr°| Rdr dr © sind dé d@) sin°-O#do 


It is evident that the angular part of the left-hand side (the two last terms in the square brackets) 
separates from the radial part, and that for the former part we get Eq. (156) again, with the only change, 
R-r. This change does not affect the fact that the eigenfunctions of that equation are still the spherical 
harmonics (171), which obey Eq. (164). As a result, Eq. (180) gives the following equation for the radial 
function €(7): 


R dr dr 


2 
2mr 


Note that no information about the magnetic quantum number m has crept into this radial equation 
(besides setting the limitation (162) for the possible values of /) so that it includes only the orbital 
quantum number /. 


Let us explore the radial equation for the simplest case when U(r) = 0 — for example, to solve the 
eigenproblem for a 3D particle free to move only inside the sphere of radius R — say, confined there by 
the potential”4 


0, forO<r<R, 
Y= (3.182) 
+0, for R <r. 
In this case, Eq. (181) is reduced to 
2 
a ; bé [ =| IZ+l)|=E. (3.183) 
2Qamr R dr dr 


Multiplying both parts of this equality by 7°&, and introducing the dimensionless argument £= kr, where 
Kk? is defined by the usual relation i’ k’/2m = E, we obtain the canonical form of this equation, 
2 
ae is +2£ as 
dg d¢ 


Satisfied by so-called spherical Bessel functions of the first and second kind, j(é) and y(é).75 These 
functions are directly related to the Bessel functions of semi-integer order,’° 


é le? -(+)]e =0, (3.184) 


73 Again, see MA Eq. (10.9). 

74 This problem, besides giving a simple example of the quantization in spherically-symmetric systems, is also an 
important precursor for the discussion of scattering by spherically-symmetric potentials in Sec. 8. 

75 Alternatively, y(&) are called “spherical Weber functions” or “spherical Neumann functions”. 

76 Note that the Bessel functions J(é) and Y\(é) of any order v obey the universal recurrence relations and 
asymptotic formulas (discussed, e.g., in EM Sec. 2.7), so that many properties of the functions 7,(é) and y(é) may 
be readily derived from these relations and Eqs. (185). 
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: a 
2é 2é 


but are actually much simpler than even the “usual” Bessel functions, such as J,(é) and Y,(é) of an 
integer order n, because the former ones may be directly expressed via elementary functions: 


it-( J n= iu) (3.185) 


_ sing 4) Sing _ cosé af 2 Pe 3 
WO= AOA AO [= ting Zr OSE nn a 
y= S, »(€)= - ~ , (= | 2-7 oose-Zesing.. 


A few lowest-order spherical Bessel functions are plotted in Fig. 21. 


0.5 


5 


g 


Fig. 3.21. Several lowest-order spherical Bessel functions. 


As these formulas and plots show, the functions y(é) are diverging at € — 0, and thus cannot be 
used in the solution of our current problem (182), so that we have to take 


&(r) = const x j, (kr). (3.187) 
Still, even for these functions, with the sole exception of the simplest function jo(¢), the characteristic 


equation j({kR) = 0, resulting from the boundary condition €(R)= 0, can be solved only numerically. 


However, the roots €, of the equation 7(¢) = 0, where the integer n (= 1, 2, 3,...) is the root’s number, 
are tabulated in virtually any math handbook, and we may express the eigenvalues we are interested in, 


- Sin = ae _ oe 


k= = = 
qs 2 1, ’ 
" —R "2m  2mR? 


(3.188) 


via these tabulated numbers. The table below lists several smallest roots, and the corresponding 
eigenenergies (normalized to their natural unit Eo = h’/2mR?), in the order of their growth. It shows a 
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very interesting effect: going up the energy spectrum, first the eigenenergies grow because of increases 
of the orbital quantum number /, at the same (lowest) radial quantum number x = 1, due to the growth of 
the first roots of functions 7(é), but then suddenly the second root of jo(é) cuts into this orderly 
sequence, just to be followed by the first root of 73(€). With the further growth of energy, the sequences 
of / and n become even more entangled. 


lL | n ae E, J Bo = (En) 
0 | 1 | w=3.1415 r= 9.87 
ne 4.493 20.19 
2/1 5.763 33.21 

O | 2 | 27%6.283 | 42 = 39.48 
3 [1 6.988 48.83 


To complete the discussion of our current problem (182), note again that the energy levels, listed 
in the table above, are (2/ +1)-degenerate because each of them corresponds to (2/ + 1) different 
eigenfunctions, each with a specific value of the magnetic quantum number m: 


A Sin? m : 
Woe = Cj, 7 Y,"(6,0), with -1<m<+4i. (3.189) 


3.7 Atoms 


Now we are ready to discuss atoms, starting from the simplest, exactly solvable Bohr atom 
problem, i.e. that of a single particle’s motion in the so-called attractive Coulomb potential” 


jee, with C > 0. (3.190) 
r 
The natural scales of FE and r in this problem are commonly defined by the requirement of equality of the 
kinetic and potential energy magnitude scales (dropping all numerical coefficients): 
2 
E, = a =, (S191) 


2 
ny ' 


similar to its particular case (1.13b). Solving this system of two equations, we get7® 


nr Cy nv? 
E,=—>5 =n( 5) : and 1, = ae (3.192) 


my, 


77 Historically, the solution of this problem in 1928, that reproduced the main result (1.12)-(1.13) of the “old” 
quantum theory developed by N. Bohr in 1912, without its phenomenological assumptions, was the decisive step 
toward the general acceptance of Schrédinger’s wave mechanics. 

78 For the most important case of the hydrogen atom, with C = e’/47&, these scales are reduced, respectively, to 
the Bohr radius 7g (1.10) and the Hartree energy Fy (1.13a). Note also that according to Eq. (192), for the so- 
called hydrogen-like atom (actually, a positive ion) with C = Z(e?/4&), these two key parameters are rescaled as 
ro > rp/Z and Eo = Vig 
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In the normalized units ¢ = E/Eo and € = r/ro, equation (181) for our current case (190), looks relatively 
simple, 
aR 2B _ugea{ert}e- 0, (3.193) 
dé*  & dé S 
but unfortunately, its eigenfunctions may be called elementary only in the most generous meaning of the 
word. With the adequate normalization, 


[RBar = Oks (3.194) 
0 


these (mutually orthogonal) functions may be represented as 


3 1/2 j 
ey es | 
e:@)= 2 (n-I yr eee a pe, (3.195) 
nr, ) 2n[(n+d)] nr 
Here L*,(¢) are the so-called associated Laguerre polynomials, which may be calculated as 
EG) > ys de Lng 6) (3.196) 


from the simple Laguerre polynomials L,(¢) =? (€).79 In turn, the easiest way to obtain L,(€) is to use 


the following Rodrigues formula:®® 
Dp 
L,(é)=e% ae" *): (3.197) 


Note that in contrast with the associated Legendre functions P/", participating in the spherical 
harmonics, all L,’ are just polynomials, and those with small indices p and q are indeed quite simple: 


L()=  LE)=-§+1, L(G) =? -46 +2, 
L(é)=1, L(é)=-2€ +4, L(é) =34? -18£ +18, (3.198) 
L(é)=2, L(é)=-6£+18, 13 (€)=12E -96£ +144... 

Returning to Eq. (195), we see that the natural quantization of the radial equation (193) has 
brought us a new integer quantum number n. To understand its range, we should notice that according to 
Eq. (197), the highest power of terms in the polynomial L,+, is (p + q), and hence, according to Eq. 
(196), that of L’, is p, so that the highest power in the polynomial participating in Eq. (195) is (n —/— 
1). Since the power cannot be negative to avoid the unphysical divergence of wavefunctions at r > 0, 


the radial quantum number 7 has to obey the restriction n => /+ 1. Since /, as we already know, may take 
values /= 0, 1, 2,..., we may conclude may only take the following values: 


79 In Eqs. (196)-(197), p and q are non-negative integers, with no relation whatsoever to the particle’s momentum 
or electric charge. Sorry for this notation, but it is absolutely common, and can hardly result in any confusion. 

80 Named after the same B. O. Rodrigues, and belonging to the same class as his other famous result, Eq. (165) for 
the Legendre polynomials. 
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n=1,2,3,... (3.199) 


What makes this relation very important is the following, most surprising result: the eigenenergies 
corresponding to the wavefunctions (179), which are indexed with three quantum numbers: 


Wim = Ral (r)Y," (9, 2) ? (3.200) 
depend only on one of them, n: 
1 E i ey 
fae. So = ie. E, = o = ” 3.201 
: 2n’ 7 2n’ 2n° of h ( ) 


i.e. agree with Bohr’s formula (1.12). Because of this reason, 7 is usually called the principal quantum 
number, and the above relation between it and the “more subordinate” orbital quantum number / is 
rewritten as 

l<n-l. (3.202) 


Together with the inequality (162), this gives us the following, very important hierarchy of the three 
quantum numbers involved in the Bohr atom problem: 


lznso% = Osisne-l = —-Il<m<4l. (3.203) 


Taking into account the (2/ +1)-degeneracy related to the magnetic number m, and using the well-known 
formula for the arithmetic progression,®! we see that the nv energy level (201) has the following orbital 
degeneracy: 


n-| n-l n-l n(n _ 1) 4 
g=> Ql4+)=2)14+> 1=2- 4 n=’. (3.204) 
1=0 1=0 1=0 2 
Due to its importance for atoms, let us spell out the hierarchy (203) of a few lowest-energy states, using 
the traditional state notation, in which the value of 7 is followed by the letter that denotes the value of /: 


n=1: l1=0 _ (one Is state) m=0. (3.205) 
n=2: 1=0  (one2s state) m=0, 
(3.206) 
/=1 (three2pstates) m=0,+1. 
n=3: 1=0_ (one3sstate) m=0, 
/=1 (three 3p states) m=0, +1, (3.207) 


l=2. (five3d states) m=0,21,22, 


Figure 22 shows plots of the radial functions (195) of the listed states. The most important of 
them is of course the ground (1s) state with n = 1 and hence E = —E)/2. According to Eqs. (195) and 
(198), its radial function is just a simple decaying exponent 
5 
Ro =sre rio (3.208) 


0 


81 See, e.g., MA Eq. (2.5a). 
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while its angular distribution is uniform — see Eq. (174). The gap between the ground energy and the 
energy E = —E)/8 of the lowest excited states (with n = 2) in a hydrogen atom (in which Eo = Ey & 27.2 
eV) is as large as ~ 10 eV, so that their thermal excitation requires temperatures as high as ~10° K, and 
the overwhelming part of all hydrogen atoms in the visible Universe are in their ground state. Since the 
atomic hydrogen makes up about 75% of the “normal” matter,’? we are very fortunate that such simple 
formulas as Eqs. (174) and (208) describe the atomic states prevalent in Mother Nature! 


0.25 0.25 
3/2 
ae Roi" 
0) 0 
- 0.25 - 0.25 
0 2 6 8 10 0 2 4 6 8 10 
rr, r/r, 
0.25 
3/2 
Ry 


Fig. 3.22. The lowest radial functions 
of the Bohr atom. 


- 0.25 
0 2 4 6 8 10 


r/r, 


According to Eqs. (195) and (198), the radial functions of the lowest excited states, 2s (with n = 
2 and /=0), and 2p (with n = 2 and /= 1) are also not too complicated: 


1 r | -r/2r 1 r —r/2r 
Fa(h)= 2 e Oy ®, (r) = —~ ——e eS 3.209 
2,0( ) (2r,)°? “| 2a ) (2r, 312, ( ) 
with the former of these states (2s) having a uniform angular distribution, and the three latter (2p) states, 
with different m = 0, +1, having simple angular distributions, which differ only by their spatial 
orientation — see Eq. (175) and the second row of Fig. 20. The most important trend here, clearly visible 
from the comparison of the two top panels of Fig. 22 as well, is a larger radius of the decay exponent in 


82 Excluding the so-far hypothetical dark matter and dark energy. 
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the radial functions (279 for n = 2 instead of ro for n = 1), and hence a larger radial extension of the 
states. This trend is confirmed by the following general formula: 


(r= 5 Bn 21021) |. (3.210) 


The second important trend is that at a fixed n, the orbital quantum number / determines how fast 
does the wavefunction change with r near the origin, and how much it oscillates in the radial direction at 
larger values of r. For example, the 2s eigenfunction &)0(r) is different from zero at r = 0, and “makes 
one wiggle” (has one root) in the radial direction, while the eigenfunctions 2p equal zero at r = 0 but do 
not cross the horizontal axis after that. Instead, those wavefunctions oscillate as the functions of an 
angle — see the second row of Fig. 20. The same trend is clearly visible for n = 3 (see the bottom panel 
of Fig. 22), and continues for the higher values of n. 


The states with / = /max = n — 1 may be viewed as crude analogs of the circular motion of a 
particle in a plane whose orientation defines the quantum number m. On the other hand, the best 
classical image of the s-state (/ = 0) is a purely radial, spherically-symmetric motion of the particle to 
and from the attracting center. (The latter image is especially imperfect because the motion needs to 
happen simultaneously in all radial directions.) The classical language becomes reasonable only for the 
highly degenerate Rydberg states, with n >> 1, whose linear superpositions may be used to compose 
wave packets closely following the classical (circular or elliptic) trajectories of the particle — just as was 
discussed in Sec. 2.2 for the free 1D motion. 


Besides Eq. (210), mathematics gives us several other simple relations for the radial functions 
&,1 (and, since the spherical harmonics are normalized to 1, for the eigenfunctions as the whole), 


including those that we will use later in the course:84 


1 1 1 1 1 1 
= : = ‘ = : 3.211 
G) nT, € 7 w(l+%)r (=). wil+“\l+)r ( ) 


In particular, the first of these formulas means that for any eigenfunction yY,;m, with all its complicated 
radial and angular dependencies, there is a simple relation between the potential and full energies: 


@),,=-c(4) =--—$ ps (3.212) 
nl 


r nor n 
so that the average kinetic energy of the particle, (7),,; = E, —(U)n,, 1s equal to E,, — 2E, = | E,, |>0. 


As in the several previous cases we have met, simple results (201), (210)-(212) are in sharp 
contrast with the rather complicated expressions for the corresponding eigenfunctions. Historically this 
contrast gave an additional motivation for the development of more general approaches to quantum 
mechanics, that would replace, or at least complement our brute-force (wave-mechanics) analysis. A 
discussion of such an approach will be the main topic of the next chapter. 


83 Note that even at the largest value of /, equal to (n —1), the second term /(/ + 1) in Eq. (210) is equal to (n° — n), 
and hence cannot over-compensate the first term 37”. 

84 The first of these relations may be readily proved using the Heller-Feynman theorem (see Chapter 1); this proof 
will be offered for the reader’s exercise after a more general form of this theorem has been proved in Chapter 6. 
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Rather strikingly, the above classification of the quantum numbers, with minor steals from the 
later chapters of this course, allows a semi-quantitative explanation of the whole system of chemical 
elements. The “only” two additions we need are the following facts: 


(1) due to their unavoidable interaction with relatively low-temperature environments, atoms tend 
to relax into their lowest-energy state, and 


(11) due to the Pauli principle (valid for electrons as the Fermi particles), each orbital eigenstate 
discussed above may be occupied by two electrons with opposite spins. 


Of course, atomic electrons do interact, so that their quantitative description requires quantum 
mechanics of multiparticle systems, which is rather complex. (Its main concepts will be discussed in 
Chapter 8.) However, the lion’s share of this interaction is reduced to simple electrostatic screening, 1.e. 
a partial compensation of the electric charge of the atomic nucleus, as felt by a particular electron, by 
other electrons of the atom. This screening changes quantitative results (such as the energy scale Eo) 
dramatically; however, the quantum number hierarchy, and hence their classification, is not affected. 


The hierarchy of atoms is most often represented as the famous periodic table of chemical 
elements,®> whose simple version is shown in Fig. 23. (The table in Fig. 24 presents a sequential list of 
the elements and their electron configurations, following the convention already used in Eqs. (205)- 
(207), with the additional upper index showing the number of electrons with the indicated values of 
quantum numbers v and /.) The number in each table’s cell, and in the first column of the list, is the so- 
called atomic number Z, which physically is the number of protons in the particular atomic nucleus, and 
hence the number of electrons in an electrically-neutral atom. 


Property legend: 


transition metals 


DS | || YO) || 27 
Mn | Fe | Co 
43 | 44 | 45 
Mo | Tc | Ru | Rh 
75 VS| || Wi 
Re | Os Ir 
106 | 107 | 108 | 109 
Bh | Hs | Mt 


Lanthanides: 


Actinides: 


Fig. 3. 23. The periodic table of elements, showing their atomic numbers and chemical symbols, as well as their 
color-coded basic physical/chemical properties at the so-called ambient (meaning usual laboratory) conditions. 


85 Also called the Mendeleev table, after Dmitri Ivanovich Mendeleev who put forward the concept of the quasi- 
periodicity of chemical element properties as functions of Z phenomenologically in 1869. (The explanation of this 
periodicity had to wait for 60 more years until the advent of quantum mechanics in the late 1920s.) 
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Atomic | Atomic | Electron Atomic | Atomic | Electron Atomic | Atomic Electron 
number | symbol states number | symbol states number | symbol Tar : 
; ; [Kr] shell, 77 Ir 4f'*5d'6s 
Period 1 Period 5 plus: 78 Pt af sds" 
1 H 1s! 37 Rb 5s" 79 Au Af'*5d'6s' 
p) He 1s° 38 Sr 55° 80 Hg 4f*5d'°6s° 
Period 2 [He] shell, 39 Y 4d'5s° 81 Tl 4f*5d'°6s°6p' 
plus: A0 Zr Ad’5s° 82 Pb Af'*5d'°6s°6p" 
3 Li 2s" Al Nb 4d'5s' 83 Bi 4f*5d'°6s°6p” 
4 Be 2s" 42 Mo 4d’5s' 84 Po 4f*5d'°6s*6p* 
5 B 25°2p' 43 Tc 4d°5s' 85 At 4f*5d'°6s°6p” 
6 C 2s°2p 44 Ru Ad'5s' 86 Rn Af'*5d'°6s"6p° 
7 N 2s°2p° 45 Rh Ad’55' : [Rn] shell, 
8 O 2s°Op! 46 Pd ad® PeReaes plus: 
9 F 2s°2p° 47 Ag Ad'°5s' 87 Fr Ts| 
10 Ne 2s2p° 48 Cd Ad'°55° 88 Ra 7s” 
Period 3 [Ne] shell, 49 In | 4d'°5s°5p' 89 Ac 6d' 7s" 
plus: 50 Sn | 4d'°5s°5p* 90 Th 6d 7s" 
11 Na 35" 51 Sb | 4d'°5s*5p° 91 Pa 5f6d 7s" 
12 Mg 397 52 Te 4d'°5s°5p* 92 U 5f6d' 7s" 
13 Al 35°3p" 53 I 4d'°5s°5p° 93 N 5f'6d 7s" 
14 Si 35°3p 54 Xe__| 4d'°5s°5p° 94 Pu 5f°7s" 
15 P 35°3p° Peer. [Xe] shell, 95 Am 5f' 7s" 
16 S 35°3p" ici plus: 96 Cm 5fod 7s" 
17 Cl 35°3p 55 Cs 6s" 97 Bk 5f'71s" 
18 Ar 35°3p° 56 Ba 6s* 98 Cf 5°78" 
ey, [Ar] shell, 57 La 5d'6s" 99 Es 5f'7s* 
plus: 58 Ce Af'5d'6s 100 Fm Sf Is 
19 K 4s! 59 Pr Af 6s" 101 Md ace 
20 Ca 4s" 60 Nd Af 6s" 102 No es 
21 Sc 3d'4s" 61 Pm Af 6s" 103 Lr 5f'46d'7s* 
22 Ti 3d°4s° 62 Sm 4f6s" 104 Rf 5f'6d- 7s" 
23 Vv 3°45" 63 Eu Af’6s" 105 Db 5f 6d? 75° 
24 Cr 3d"4s" 64 Gd Af’5d'6s 106 S 5f 6d‘ 7s" 
25 Mn 30°45 65 Tb Af 6s" 107 Bh 5f'6d° 7s” 
26 Fe 3d°4s" 66 D Af°6s° 108 Hs 5f 6d 7s" 
27 Co 3d'4s° 67 Ho Af''6s° 109 Mt 5f 6d’ Is" 
28 Ni 3d°4s 68 Er 4f'*6s° 110 Ds 56°75" 
29 Cu 3d°4s' 69 Tm 4f°6s° 111 Rg 56d’ 7s" 
30 Zn 3d'°45" 70 Yb 4f'*6s° 112 Cn 5f6d' "7s" 
31 Ga__| 3d'°4s*4p! 71 Lu Af'*5d'6s" 113 Uut | 5f*6d'°7s°7p' 
32 Ge | 3d'°4s"4p" 72 Hf | 4f'*5d’6s" 114 FI 5f"6d'"7s° Tp" 
33 As | 3d'°4s74p° 23 Ta | 4f'*5d°6s" 115 Uup | 5f*6d'°7s°7p" 
34 Se 3d'"4s"4p" 74 WwW Af'*5d"6s" 116 Lv 5f'46d'°7s°Tp* 
35 Br | 3d'"4s*4p° 75 Re | 4f'*5d°6s" 117 Uus | 5f6d'°7s°7p” 
36 Kr 3d'"4s"4p° 716 Os Af'*5d°6s" 118 Uuo | 5f*6d'°7s°7p* 


Fig. 3.24. Atomic electron configurations. The upper index shows the number of electrons in the states with 
the indicated quantum numbers n (the first digit) and / (letter-coded as was discussed above). 
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The simplest atom, with Z = 1, is hydrogen (chemical symbol H) — the only atom for that the 
theory discussed above is quantitatively correct.8° According to Eq. (191), the 1s ground state of its only 
electron corresponds to the quantum number values n = 1, / = 0, and m = 0 — see Eq. (205). In most 
versions of the periodic table, the cell of H is placed in the top left corner. 


In the next atom, helium (symbol He, Z = 2), the same orbital quantum state (1s) is filled with 
two electrons. As will be discussed in detail in Chapter 8, electrons of the same atom are actually 
indistinguishable, so that their quantum states are not independent and may be entangled. These factors 
are important for several properties of helium atoms (and heavier elements as well); however, a bit 
counter-intuitively, for the atom classification purposes, they are not crucial, and we may pretend that 
two electrons of a helium atom just have “opposite spins”. Due to the twice higher electric charge of the 
nucleus of the helium atom, i.e. the twice higher value of the constant C in Eq. (190), resulting in a 4- 
fold increase of the constant Ey given by Eq. (192), the binding energy of each electron is crudely 4 
times higher than that of the hydrogen atom — though the electron interaction decreases it by about 25% 
— see Sec. 8.2. This is why taking one electron away (i.e. the positive ionization of a helium atom) 
requires relatively high energy, ~23.4 eV, which is not available in the usual chemical reactions. On the 
other hand, a neutral helium atom cannot bind one more electron (i.e. form a negative ion) either. As a 
result, the helium, and all other elements with fully completed electron shells (the term meaning the sets 
of states with eigenenergies well separated from higher energy levels) is a chemically inert noble gas, 
thus starting the whole right-most column of the periodic table, allocated for such elements. 


The situation changes rather dramatically as we move to the next element, lithium (Li), with Z = 
3 electrons. Two of them are still accommodated by the inner shell with n = 1 (listed in Fig. 24 as the 
helium shell [He}]), but the third one has to reside in the next shell with n = 2, / = 0, and m = 0, 1.e. in the 
2s state. According to Eq. (201), the binding energy of this electron is much lower, especially if we take 
into account that according to Eqs. (210)-(211), the 1s electrons of the [He] shell are much closer to the 
nucleus and almost completely compensate two-thirds of its electric charge +3e. As a result, the 2s-state 
electron is approximately, but reasonably well described by Eq. (201) with Z = 1 and n = 2, giving 
binding energy close to just 3.4 eV (experimentally, ~5.39 eV), so that a lithium atom can give out that 
electron rather easily — to either an atom/ion of another element to form a chemical compound, or to the 
common conduction band of the solid-state lithium; as a result, at the ambient conditions, it is a typical 
alkali metal. The similarity of chemical properties of lithium and hydrogen, with the chemical valence 
of one,’’ places Li as the starting element of the second period (row), with the first period limited to 
only H and He — see Fig. 23. 


In the next element, beryllium (symbol Be, Z = 4), the 2s state (n = 2, / = 0, m = 0) houses one 
more electron, with the “opposite spin”. Due to the higher electric charge of the nucleus, 0 = +4e, with 
only half of it compensated by Is electrons of the [He] shell, the binding energy of the 2s electrons is 
somewhat higher than that in lithium, so that the ionization energy increases to ~9.32 eV. As a result, 
beryllium is also chemically active with the valence of two, but not as active as lithium, and is also is 
metallic in its solid-state phase, but with a lower electric conductivity than lithium. 


86 Besides very small fine-structure and hyperfine-splitting corrections — to be discussed, respectively, in Chapters 
6 and 8. 

87 The chemical valence (or “valency”) is a not very precise term describing the number of atom’s electrons 
involved in chemical reactions. For the same atom, especially with a large number of electrons in its outer, 
unfilled shell, this number may depend on the chemical compound formed. (For example, the valence of iron is 
two in the ferrous oxide, FeO, and three in the ferric oxide, Fe.03.) 
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Moving in this way along the second row of the periodic table (from Z = 3 to Z = 10), we see a 
gradual filling of the rest of the total 2n* = 2x2* = 8 different electron states of the n = 2 shell (see Eq. 
(204), with the additional spin degeneracy factor of 2), including two 2s states with m = 0, and six 2p 
states with m = 0, +1,88 with a gradually growing ionization potential (up to ~21.6 eV in Ne with Z = 
10), i.e. a growing reluctance to conduct electricity or form positive ions. However, the last elements of 
the row, such as oxygen (O, with Z = 8) and especially fluorine (F, with Z = 9) can readily pick up extra 
electrons to fill up their 2p states, i.e. form negative ions. As a result, these elements are chemically 
active, with a double valence for oxygen and a single valence for fluorine. However, the final element of 
this row, neon, has its n = 2 shell completely full, and cannot form a stable negative ion. This is why it is 
a noble gas, like helium. Traditionally, in the periodic table, such elements are placed right under helium 
(Fig. 23), to emphasize the similarity of their chemical properties. But this necessitates making an at 
least 6-cell gap in the 1“ row. (Actually, the gap is often made larger, to accommodate the next rows — 
keep reading.) 


Period 3, i.e. the 3™ row of the table, starts exactly like period 2, with sodium (Na, with Z = 11), 
also a chemically active alkali metal whose atom features 10 electrons filling the shells with n = 1 andn 
= 2 (in Fig. 24, collectively called the neon shell [Ne]), plus one electron in the 3s state (n = 3, /=0, m= 
0), which may be again reasonably well described by the hydrogen atom theory — see, e.g., the red curve 
on the last panel of Fig. 22. Continuing along this row, we could naively expect that, according to Eq. 
(204), and with the account of double spin degeneracy, this period of the table should have 2n” = 2x3* = 
18 elements, with a gradual, sequential filling of two 3s states, then six 3p states, and then ten 3d states. 
However, here we run into a big surprise: after argon (Ar, with Z = 18), a relatively inert element with 
the ionization energy of ~15.7 eV due to the fully filled 3s and 3p shells, the next element, potassium 
(K, with Z = 19) is an alkali metal again! 


The reason for that is the difference of the actual electron energies from those of the hydrogen 
atom, which is due mostly to inter-electron interactions, and gradually accumulates with the growth of 
Z. It may be semi-quantitatively understood from the results of Sec. 6. In hydrogen-like atoms/ions, the 
electron state energies do not depend on the quantum number / (as well as m) — see Eq. (201). However, 
the orbital quantum number does affect the wavefunction of an electron. As Fig. 22 shows, the larger / 
the less the probability for an electron to be close to the nucleus, where its positive charge is less 
compensated by other electrons. As a result of this effect (and also the relativistic corrections to be 
discussed in Sec. 6.3), the electron’s energy grows with /. Actually, this effect is visible already in 
period 2 of the table: it manifests itself in the filling order — the p states after the s states. However, for 
potassium (K, with Z = 19) and calcium (Ca, with Z = 20), the energies of the 3d states become so high 
that the energies of the two 4s states are lower, and the latter states are filled first. As described by Eq. 
(210), and also by the first of Eqs. (211), the effect of the principal number 7 on the distance from the 
nucleus is much stronger than that of /, so that the 4s wavefunctions of K and Ca are relatively far from 
the nucleus, and determine the chemical valence (equal to 1 and 2, correspondingly) of these elements. 
The next atoms, from Sc (Z = 21) to Zn (Z = 30), with the gradually filled “internal” 3d states, are the 
so-called transition metals whose (comparable) ionization energies and chemical properties are 
determined by the 4s electrons. 


This fact is the origin of the difference between various forms of the “periodic” table. In its most 
popular option, shown in Fig. 23, K is used to start the next period 4, and then a new period is started 


88 The specific order of filling of the states within each shell follows the so-called Hund rules — see Sec. 8.3. 
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each time and only when the first electron with the next principal quantum number (7) appears.®® This 
topology of the table provides a very clear match of the chemical properties of the first element of each 
period (an alkali metal), as well as its last element (a noble gas). It also automatically means making 
gaps in all previous rows. Usually, this gap is made between the atoms with completely filled s states 
and with those with the first electron in a p state, because here the properties of the elements make a 
somewhat larger step. (For example, the step from Be to B makes the material an insulator, but the step 
from Mg to Al makes a smaller difference.) As a result, the elements of the same column have only 
approximately similar chemical valences and physical properties. 


In order to accommodate the lower, longer rows, such representation is inconvenient, because 
the whole table would be too broad. This is why the so-called rare earth elements, including /anthanides 
(with Z from 57 to 70, of the 6" row, with a gradual filling of the 4fand Sd states) and actinides (Z from 
89 to 103, of the 7" row, with a gradual filling of the 5fand 6d states), are usually represented as outlet 
rows — see Fig. 23. This is quite acceptable for basic chemistry, because the chemical properties of the 
elements within each such group are rather close. 


To summarize my very short review of this extremely important topic, the “periodic table of 
elements” is not periodic in the strict sense of the word. Nevertheless, it has had an enormous historic 
significance for chemistry, as well as atomic and solid-state physics, and is still very convenient for 
many purposes. For our course, the most important aspect of its discussion is the surprising possibility to 
describe, at least for classification purposes, such a complex multi-electron system as an atom as a set of 
quasi-independent electrons in certain quantum states indexed with the same quantum numbers n, /, and 
m as those of the hydrogen atom. This fact enables the use of various perturbation theories, which give 
a more quantitative description of atomic properties. Some of these techniques will be reviewed in 
Chapters 6 and 8. 


3.8. Spherically-symmetric scatterers 


The machinery of the Legendre polynomials and the spherical Bessel functions, discussed in Sec. 
6, may also be used for analysis of particle scattering by spherically-symmetric potentials (155) beyond 
the Born approximation (Sec. 3), provided that such a potential U(r) is also localized, i.e. reduces 
sufficiently fast at r > oo. (The quantification of this condition is left for the reader’s exercise.) 


Indeed, directing the z-axis along the propagation of the incident plane de Broglie wave yw, and 
taking its origin in the center of the scatterer, we may expect the scattered wave y% to be axially 
symmetric, so that its expansion in the series over the spherical harmonics includes only the terms with 
m= 0. Hence, the solution (64) of the stationary Schrédinger equation (63) in this case may be 
represented as?! 


w=w,ty, =a ef + ERI (6080)], (3.213) 


89 Another popular option is to return to the first column as soon an atom has one electron in the s state (like it is 
in Cu, Ag, and Au, in addition to the alkali metals). 

°° For a bit more detailed (but still succinct) discussion of the valence and other chemical aspects of atomic 
structure, I can recommend Chapter 5 of the very clear text by L. Pauling, General Chemistry, Dover, 1988. 

°! The particular terms in this sum are frequently called partial waves. 
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where k = (2mE)'"/h is defined by the energy E of the incident particle, while the radial functions &(r) 
have to satisfy Eq. (181), and be finite at r — 0. At large distances r >> R, where R is the effective 
radius of the scatterer, the potential U(r) is negligible, and Eq. (181) is reduced to Eq. (183). In contrast 
to its analysis in Sec. 6, we should look for its solution using a linear superposition of the spherical 
Bessel functions of both kinds: 


R(r)=4,j,(kr)+ By (kr), at r>>R, (3.214) 


because Eq. (183) is now invalid at y — 0, and our former argument for dropping the functions y((k7r) is 
no more valid. In Eq. (214), A; and B; are some complex coefficients, determined by the scattering 
potential U(r), i.e. by the solution of Eq. (181) atr~ R. 


As the explicit expressions (186) show, the spherical Bessel functions 7(é) and y,é) represent 
standing de Broglie waves, with equal real amplitudes, so that their simple linear combinations (called 
the spherical Hankel functions of the first and second kind), 


nE)=7,(E)+iy,(€), and A (E)= 7,(€)-w,(€), (3.215) 


represent traveling spherical waves propagating, respectively, from the origin (i.e. from the center of the 
scatterer), and toward the origin. In particular, at € >> 1, /, i.e. at large distances r >> 1/k, I/k,?? 


-\/+1 : [+1 : 
Al (kr) > (i) ike A?) (kr) >» —e# (3.216) 
kr kr 
But using the same physical argument as at the beginning of Sec. 1, we may argue that in the case of a 
localized scatterer, there should be no latter waves at r >> R; hence, we have to require the amplitude of 
the term proportional to h to be zero. With the relations reciprocal to Eqs. (215), 


A)=SWO+MP OL — (=H) HE)] G.217) 
which enable us to rewrite Eq. (214) as 
Bile) = Lo (e) + ate) SE [H"(e)— 6) 
(3.218) 
= (5 )ne)e( 42 are 


this means that the combination (4;+ iB) has to be equal zero, so that B;= iA). Hence we have just one 
unknown coefficient (say, 4;) for each /,°3 and may rewrite Eq. (218) in an even simpler form: 


&(r)= Ai, (kr) + iv, (kr]= AA (kr), at r >> RR, (3.219) 


°2 For arbitrary /, this result may be confirmed using Eqs. (185) and the asymptotic formulas for the “usual” 
Bessel functions — see, e.g., EM Eqs. (2.135) and (2.152), valid for an arbitrary (not necessarily integer) index n. 
°3 Moreover, using the conservation of the orbital momentum, to be discussed in Sec. 5.6, it is possible to show 
that this complex coefficient may be further reduced to just one real parameter, usually recast as the partial phase 
shift 6, between the /" spherical harmonics of the incident and scattered waves. However, I will not use this 
notion, because practical calculations are more physically transparent (and not more complex) without it. 
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and use Eqs. (213) and (216) to write the following expression for the scattered wave at large distances: 


yl. is aie A,P(cos6), at r >> Ro, (3.220) 


Comparing this expression with the general Eq. (81), we see that for a spherically-symmetric, 
localized scatterer, 


f= 2D" A,P(cos@), (3.221) 


so that the differential cross-section (84) is 


2 foe) 
=p -! 4 A* P(cos6)P,(cos 8). (3.222) 


2 


YE i)" A,P(cos@) 


The last expression is more convenient for the calculation of the total cross-section (59): 


= a dQ =2n [ aloos0) = =F i LAA, fre P.(E)dE , (3.223) 


1l'=0 


where €= cos @, because this result may be much simplified by using Eq. (167):%4 


(3.224) 


Hence the solution of the scattering problem is reduced to the calculation of the partial wave 
amplitudes A; defined by Eq. (219) — and for the total cross-section, merely of their magnitudes. This 
task is much facilitated by using the following Rayleigh formula for the expansion of the incident plane 
wave’s exponent into a series over the Legendre polynomials,*> 


ene agree =yi( (21 +1)j,(kr)P(cos@). (3.225) 
1=0 


As the simplest example, let us calculate scattering by a completely opaque and “hard” (meaning 
sharp-boundary) sphere, which may be described by the following potential: 


+o, for r<R, 
U(r)= (3.226) 
0, for R<r. 


In this case, the total wavefunction has to vanish at r < R, and hence for the external problem (r = R) the 
sphere enforces the boundary condition y= w+ y% = 0 for all values of @ at r = R. With Eqs. (213), 
(220) and (225), this condition becomes 


a, » le, (R)+ i! (21+1);, (kr) |p (cos6)=0. (3.227) 


4 Physically, this reduction of the double sum to a single one means that due to the orthogonality of the sherical 
harmonics, the total scattering probability flows due to each partial wave just add up. 

95 It may be proved using the Rodrigues formula (165) and integration by parts — the task left for the reader’s 
exercise. 
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Due to the orthogonality of the Legendre polynomials, this condition may be satisfied for all 
angles @ only if all the coefficients before all P(cos@) vanish, 1.e. if 


&(R)=—i! (21 +1);,(kR). (3.228) 


On the other hand, for r > R, U(r) = 0, so that Eq. (183) is valid, and its outward-wave solution (219) has 
to be valid even atr — R, giving 


R,(R) = 4,Li, (kR)+ iv, (KR)]. (3.229) 
Requiring the two last expressions to give the same result, we get 
ii (KR) 


4, =i! (2141) 20) 
Ji (kR)+ iy, (KR) 


(3.230) 


so that Eqs. (222) and (224) yield: 
2 
do. .1\< 7, (AR) 4n(2/+1) (KR) 
pee he | P, A) , = ; 3.231 
a Teen eR) Oe siteny O89 


As Fig. 25a shows, the first of these results describes an angular structure of the scattered de 
Broglie wave, which is qualitatively similar to that given by the Born approximation — cf. Eq. (98) and 
Fig. 10. 


(b) 
100 
kKR=30 
10 
ldo }\\" 
{@) 1 
o, d. £0 01 
0.1 
9.01 : 
0 0.2 0.4 0.6 0.8 10 


Olin 


Fig. 3.25. Particle scattering by an opaque, hard sphere: (a) the differential cross-section 
normalized to the geometric cross-section o, = mR’ of the sphere, as a function of the scattering 
angle @, and (b) the (similarly normalized) total cross-section and its lowest spherical components, 
as functions of the dimensionless product kR « E'”. 


Namely, at low particle’s energies (kR << 1), the scattering is essentially isotropic, while in the 
opposite, high-energy limit AR >> 1, it is mostly confined to small angles 9 ~ wkR << 1, and exhibits 
numerous local destructive-interference minima at angles 6, ~ 2n/kR. However, in our current (exact!) 
theory, these minima are always finite, because the theory describes effective bending of the de Broglie 
waves along the back side of the sphere, which smears the interference pattern. 
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The same bending is also responsible for a rather counter-intuitive fact, described by the second 
of Eqs. (231) and clearly visible in Fig. 25b: even at kR + , the total cross-section o of scattering 
tends to 20, = 27°, rather than to Oz as in the purely-classical scattering theory. First discovered for 
optical wave scattering, this fact (common for all large non-absorbing scatterers) is sometimes 
interpreted as a manifestation of the so-called wave extinction paradox. 


The fact that at KR << 1, the cross-section is also larger than og, approaching 40, at kR — 0, is 
much less surprising, because in this limit the de Broglie wavelength 2 = 2/k is much longer than the 
sphere’s radius R, so that the wave’s propagation is affected by the whole sphere. 


The above analysis may be readily generalized to the case a step-like (sharp but finite) potential 
(97) — the problem left for the reader’s exercise. On the other hand, for a finite and smooth scattering 
potential U(r), plugging Eq. (225) into Eq. (213) and the result into Eq. (66), requiring the coefficients 
before each angular function P(cos@) to be balanced, and canceling the common coefficient ao, we get 
the following inhomogeneous generalization of Eq. (181) for the radial functions defined by Eq. (213): 


[E-u(r)|z + J ; < G . ) (1 “hk (r) =U(r)i! (21 +1);, (kr). (3.232) 
mr r r 


This differential equation has to be solved in the whole scatterer volume (i.e. for all 7 ~ R) with 
the boundary conditions for the functions &(7) to be finite at 7 — 0, and to tend to the asymptotic form 
(219) at r >> R. The last requirement enables the evaluation of the coefficients A; that are needed for 
spelling out Eqs. (222) and (224). Unfortunately, due to the lack of time, I have to refer the reader 
interested in such cases to special literature.%° 


3.9. Exercise Problems 


3.1. A particle of energy £ is incident (in the figure on the right, within 
the plane of the drawing) on a sharp potential step: 


0, for x<0, 
U(r) = 
U,, for 0<x. 


Calculate the particle reflection probability “ as a function of the incidence 
angle @ sketch and discuss this function for various magnitudes and signs of Uo. 


3.2." Analyze how are the Landau levels 8 
(50) modified by an additional uniform electric gate | V,<0 | i, <0 | gate 
field & directed along the plane of the particle’s 26 
motion. Contemplate the physical meaning of 2D ne ——— 
your result and its implications for the quantum __gas plane Reaconducrar 


Hall effect in a gate-defined Hall bar. (The area 


% See, e.g., J. Taylor, Scattering Theory, Dover, 2006. 
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lxw of such a bar [see Fig. 6] is defined by metallic gate electrodes parallel to the 2D electron gas plane 
— see the figure on the right. The negative voltage V., applied to the gates, squeezes the 2D gas from the 
area under them into the complementary, Hall-bar part of the plane.) 


3.3. Analyze how are the Landau levels (50) modified if a 2D particle is confined in an 
additional 1D potential well U(x) = may’x7/2. 


3.4. Find the stationary states of a spinless, charged 3D particle moving in “crossed” (mutually 
perpendicular), uniform electric and magnetic fields, with & << c.%. For such states, calculate the 
expectation values of the particle’s velocity in the direction perpendicular to both fields, and compare 
the result with the solution of the corresponding classical problem. 


Hint: You may like to generalize Landau’s solution for 2D particles, discussed in Sec. 2. 


3.5. Use the Born approximation to calculate the angular dependence and the total cross-section 
of scattering of an incident plane wave propagating along the x-axis, by the following pair of similar 


point inhomogeneities: 
U(r) =wW a{r—n.$ +0 r+n = 
2 2 


Analyze the results in detail. Derive the condition of the Born approximation’s validity for such delta- 
functional scatterers. 


3.6. Complete the analysis of the Born scattering by a uniform spherical potential (97), started in 
Sec. 3, by calculation of its total cross-section. Analyze the result in the limits AR << 1 and AR >>1. 


3.7. Use the Born approximation to calculate the differential cross-section of particle scattering 
by a very thin spherical shell, whose potential may be approximated as 
U(r) = W65(r-R). 
Analyze the results in the limits KR << 1 and kR >> 1, and compare them with those for a uniform sphere 


considered in Sec. 3. 


3.8. Use the Born approximation to calculate the differential and total cross-sections of electron 
scattering by a screened Coulomb field of a point charge Ze, with the electrostatic potential 


Wr)= ae, 
ME 


neglecting spin interaction effects, and analyze the result’s dependence on the screening parameter 1. 
Compare the results with those given by the classical (“Rutherford”) formula?’ for the unscreened 
Coulomb potential (A — 0), and formulate the condition of Born approximation’s validity in this limit. 


3.9. A quantum particle with electric charge Q is scattered by a localized distributed charge with 
a spherically-symmetric density p(r), and zero total charge. Use the Born approximation to calculate the 


97 See, e.g., CM Sec. 3.5, in particular Eq. (3.73). 
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differential cross-section of the forward scattering (with the scattering angle 0= 0), and evaluate it for 
the scattering of electrons by a hydrogen atom in its ground state. 


3.10. Reformulate the Born approximation for the 1D case. Use the result to find the scattering 
and transfer matrices of a “rectangular” (flat-top) scatterer 


Us, for | x| <d/2, 
0, otherwise. 


0)=| 


Compare the results with those of the exact calculations carried out earlier in Chapter 2, and analyze 
how does their relationship change in the eikonal approximation. 


3.11. In the tight-binding approximation, find the lowest stationary states of a particle placed into 
a system of three similar, weakly coupled potential wells located in the vertices of an equilateral 


triangle. 


3.12. The figure on the right shows a fragment of a periodic 2D lattice, 


/ ai, : 
with the red and blue points showing the positions of different local potentials. ee 6 
(i) Find the reciprocal lattice and the 1“ Brillouin zone of the system. Po ob | 
(ii) Calculate the wave number k of the monochromatic de Broglie wave “YY © es ee a 


incident along axis x, at that the lattice creates the lowest-order diffraction peak <4... } 7 o--O- 
within the [x, y] plane, and the direction toward this peak. oe 

(iii) Semi-quantitatively, describe the evolution of the intensity of the 
peak when the local potentials, represented by the different points, become 
similar. 


Hint: The order of diffraction on a multidimensional Bravais lattice is a somewhat ambiguous 
notion, usually associated with the sum of magnitudes of all integers /; in Eq. (109), for the vector Q that 
is equal to q=k—kj. 


3.13. For the 2D hexagonal lattice (Fig. 12b): 


(i) find the reciprocal lattice Q and the 1“ Brillouin zone; 

(11) use the tight-binding approximation to calculate the dispersion relation E(q) for a 2D particle 
moving through a potential profile with such periodicity, with an energy close to the energy of the 
axially-symmetric states quasi-localized at the potential minima; 

(iii) analyze and sketch (or plot) the resulting dispersion relation E(q) inside the 1“ Brillouin 
zone. 


3.14. Complete the tight-binding-approximation calculation of the band structure of the 
honeycomb lattice, started at the end of Sec. 4. Analyze the results; in particular prove that the Dirac 
points qp are located in the corners of the 1“ Brillouin zone, and express the velocity v, participating in 
Eq. (122), in terms of the coupling energy 6,. Show that the final results do not change if the quasi- 
localized wavefunctions are not axially-symmetric, but are proportional to exp {img} — as they are, with 
m = 1, for the 2p, electrons of carbon atoms in graphene, that are responsible for its transport properties. 


3.15. Examine basic properties of the so-called Wannier functions defined as 
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¢, (r) = const x |v (rye FR aq, 
BZ 


where y,(r) is the Bloch wavefunction (108), R is any vector of the Bravais lattice, and the integration 
over the quasimomentum q is extended over any (e.g., the first) Brillouin zone. 


3.16. Evaluate the long-range interaction (the so-called London dispersion force) between two 
similar, electrically-neutral atoms or molecules, modeling each of them as an isotropic 3D harmonic 
oscillator with the electric dipole moment d = qs, where s is the oscillator’s displacement from its 
equilibrium position. 

Hint: Represent the total Hamiltonian of the system as a sum of Hamiltonians of independent 1D 
harmonic oscillators, and calculate their total ground-state energy as a function of the distance between 
the dipoles. 98 


3.17. Derive expressions for the stationary wavefunctions and the corresponding energies of a 
2D particle of mass mm, free to move inside a round disk of radius R. What is the degeneracy of each 
energy level? Calculate the five lowest energy levels with an accuracy better than 1%. 


3.18. Calculate the ground-state energy of a 2D particle of mass sm, localized in a very shallow 
flat-bottom potential well 


-—U,, for p< R, 4 
Ulpjeas sa: e with 0<U, ce 
0, for p> R, mR 


3.19. Estimate the energy E of the localized ground state of a particle of mass »», in an axially- 


symmetric 2D potential well of a finite radius R, with an arbitrary but very small potential U(p). 
(Quantify this condition.) 


3.20. Spell out the explicit form of the spherical harmonics Y,’(0,@) and Y,'(0,9). 


3.21. Calculate (x) and (x?) in the ground states of the planar and spherical rotators of radius R. 
What can you say about (p,) and (p,”)? 


3.22. A spherical rotator, with r = (x + y*+ z’)'” = R = const, of mass » is in a state with the 
following wavefunction: y= constx(Y% + sin’@). Calculate its energy. 


3.23. According to the discussion at the beginning of Sec. 5, stationary wavefunctions of a 3D 
harmonic oscillator may be calculated as products of three 1D “Cartesian oscillators” — see, in particular 
Eq. (125), with d = 3. However, according to the discussion in Sec. 6, the wavefunctions of the type 
(200), proportional to the spherical harmonics Y/”, also describe stationary states of this spherically- 
symmetric system. Represent the wavefunctions (200) of: 


°8 This explanation of the interaction between electrically-neutral atoms was put forward in 1930 by F. London, 
on the background of a prior (1928) work by C. Wang. Note that in some texts this interaction is (rather 
inappropriately) referred to as the “van der Waals force”, though it is only one, long-range component of the van 
der Waals model — see, e.g. SM Sec. 4.1. 
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(1) the ground state of the oscillator, and 
(ii) each of its lowest excited states, 


as linear combinations of products of 1D oscillator’s stationary wavefunctions. Also, calculate the 
degeneracy of the n'" energy level of the oscillator. 


3.24. Calculate the smallest depth Up of a spherical, flat-bottom potential well 


UG)= -U,, forr<R, 
7 0, forR<r, 


at that it has a bound (localized) stationary state. Does such a state exist for a very narrow and deep well 
U(r) =—W&r), with a positive and finite W? 


3.25. A 3D particle of mass m is placed into a spherically-symmetric potential well with —co < 


U(r) < U(co) = 0. Relate its ground-state energy to that of a 1D particle of the same mass, moving in the 
following potential well: 

U(x), for x 20, 

+00, for x <0. 


u(x) | 


In the light of the found relation, discuss the origin of the difference between the solutions of the 
previous problem and Problem 2.17. 


3.26. Calculate the smallest value of the parameter Uo, for that the following spherically- 
symmetric potential well, 


U(r)=-U,e"'®, with U,, R>0, 
has a bound (localized) eigenstate. 


Hint: You may like to introduce the following new variables: f= rand € = Ce”**, with an 
appropriate choice of the constant C. 


3.27. A particle moving in a certain central potential U(r) has a stationary state with the 
following wavefunction: 


B 


yw =Cr%e ”" cos@, 
where C, a, and /> 0 are constants. Calculate: 

(1) the probabilities of all possible values of the quantum numbers m and /, and 

(11) the confining potential and the state’s energy. 

3.28. Use the variational method to estimate the ground-state energy of a particle of mass m, 
moving in the following spherically-symmetric potential: 


U(r)= ar’. 
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3.29. Use the variational method, with the trial wavefunction Weria) = A/(r + a)’, where both a > 0 
and b > | are fitting parameters, to estimate the ground-state energy of the hydrogen-like atom/ion with 
the nuclear charge +Ze. Compare the solution with the exact result. 


3.30. Calculate the energy spectrum of a particle moving in a monotonic, but otherwise arbitrary 
attractive spherically-symmetric potential U(r) < 0, in the approximation of very large orbital quantum 
numbers /. Formulate the quantitative condition(s) of validity of your theory. Check that for the 
Coulomb potential U(r) = —C/r, your result agrees with Eq. (201). 


Hint: Try to solve Eq. (181) approximately, introducing the same new function, f(r) = r&(r), that 
was already used in Sec. | and in the solutions of a few earlier problems. 


3.31. An electron had been in the ground state of a hydrogen-like atom/ion with nuclear charge 
Ze, when the charge suddenly changed to (Z + 1)e. 99 Calculate the probabilities for the electron of the 
changed system to be: 


(i) in the ground state, and 
(ii) in the lowest excited state. 


3.32. Due to a very short pulse of an external force, the nucleus of a hydrogen-like atom/ion, 
initially at rest in its ground state, starts moving with velocity v. Calculate the probability W, that the 
atom remains in its ground state. Evaluate the energy to be given, by the pulse, to a hydrogen atom in 
order to reduce W, to 50%. 


3.33. Calculate (x*) and (p,’) in the ground state of a hydrogen-like atom/ion. Compare the 
results with Heisenberg’s uncertainty relation. What do these results tell about the electron’s velocity in 
the system? 


3.34. Use the Hellmann-Feynman theorem (see Problem 1.5) to prove: 


(i) the first of Eqs. (211), and 
(11) the fact that for a spinless particle in an arbitrary spherically-symmetric attractive potential 
U(r), the ground state is always an s-state (with the orbital quantum number / = 0). 


3.35. For the ground state of a hydrogen atom, calculate the expectation values of & and é, 


where @ is the electric field created by the atom, at distances r >> ro from its nucleus. Interpret the 
resulting relation between (& and (&7), at the same observation point. 


3.36. Calculate the condition at that a particle of mass +, moving in the field of a very thin 
spherically-symmetric shell, with 
U(r) = W6(r — R), 


and W< 0), has at least one localized (“bound”) stationary state. 


°9 Such a fast change happens, for example, at the beta-decay, when one of the nucleus’ neurons spontaneously 
turns into a proton, emitting a high-energy electron and a neutrino, which leave the system very fast (instantly on 
the atomic time scale), and do not affect directly the atom transition’s dynamics. 
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3.37. Calculate the lifetime of the lowest metastable state of a particle in the same spherical-shell 
potential as in the previous problem, but now with Ww > 0, for sufficiently large W. (Quantify this 


condition.) 


3.38. A particle of mass + and energy E£ is incident on a very thin spherical shell of radius R, 
whose localized states were the subject of two previous problems, with an arbitrary “weight” W. 


(1) Derive general expressions for the differential and total cross-sections of scattering for this 
geometry. 

(11) Spell out the contribution o9 to the total cross-section o, due to the spherically-symmetric 
component of the scattered de Broglie wave. 

(111) Analyze the result for o in the limits of very small and very large magnitudes of W, for both 
signs of this parameter. In particular, in the limit #/ — +oo, relate the result to the metastable state’s 


lifetime z calculated in the previous problem. 


3.39. Calculate the spherically-symmetric contribution oo to the total cross-section of particle 
scattering by a uniform sphere of radius R, described by the following potential: 


U,, forr<R, 
U(r)= 
0, otherwise, 


with an arbitrary Up. Analyze the result in detail, and give an interpretation of its most remarkable 
features. 


3.40. Use the finite difference method with the step / = a/2 to calculate as many energy levels as 
possible, for a particle confined to the interior of: 


(i) a square with side a, and 
(ii) a cube with side a, 


with hard walls. For the square, repeat the calculations, using a finer step: 4 = a/3. Compare the results 
for different values of / with each other and with the exact formulas. 


Hint: It is advisable to either first solve (or review the solution of) the similar 1D Problem 1.15, 
or start from reading about the finite difference method.!® Also: try to exploit the symmetry of the 
systems. 


100 See, e.g., CM Sec. 8.5 or EM Sec. 2.11. 
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Chapter 4. Bra-ket Formalism 


The objective of this chapter is to describe Dirac’s “bra-ket” formalism of quantum mechanics, which 
not only overcomes some inconveniences of wave mechanics but also allows a natural description of 
such intrinsic properties of particles as their spin. In the course of the formalism’s discussion, I will give 
only a few simple examples of its application, leaving more involved cases for the following chapters. 


4.1. Motivation 


As the reader could see from the previous chapters of these notes, wave mechanics gives many 
results of primary importance. Moreover, it is mostly sufficient for many applications, for example, 
solid-state electronics and device physics. However, in the course of our survey, we have filed several 
grievances about this approach. Let me briefly summarize these complaints: 


(1) Attempts to analyze the temporal evolution of quantum systems, beyond the trivial time 
behavior of the stationary states, described by Eq. (1.62), run into technical difficulties. For example, we 
could derive Eq. (2.151) describing the metastable state’s decay and Eq. (2.181) describing the quantum 
oscillations in coupled wells, only for the simplest potential profiles, though it is intuitively clear that 
such simple results should be common for all problems of this kind. Solving such problems for more 
complex potential profiles would entangle the time evolution analysis with the calculation of the spatial 
distribution of the evolving wavefunctions — which (as we could see in Secs. 2.9 and 3.6) may be rather 
complex even for simple time-independent potentials. Some separation of the spatial and temporal 
dependencies is possible using perturbation approaches (to be discussed in Chapter 6), but even those 
would lead, in the wavefunction language, to very cumbersome formulas. 


(ii) The last statement can also be made concerning other issues that are conceptually 
addressable within the wave mechanics, e.g., the Feynman path integral approach, coupling to the 
environment, etc. Pursuing them in the wave mechanics language would lead to formulas so bulky that I 
had postponed their discussion until we would have a more compact formalism on hand. 


(iii) In the discussion of several key problems (for example the harmonic oscillator and 
spherically-symmetric potentials), we have run into rather complicated eigenfunctions coexisting with 
very simple energy spectra — that infer some simple background physics. It is very important to get this 
physics revealed. 


(iv) In the wave-mechanics postulates formulated in Sec. 1.2, the quantum mechanical operators 
of the coordinate and momentum are treated rather unequally — see Eqs. (1.26b). However, some key 
expressions, e.g., for the fundamental eigenfunction of a free particle, 


expf iP} (4.1) 
h 
or the harmonic oscillator’s Hamiltonian, 
| ma 
H =— p?+— P’, 4.2 
Im? 2 ce) 


just beg for a similar treatment of coordinates and momenta. 


© K. Likharev 


Essential Graduate Physics QM: Quantum Mechanics 


However, the strongest motivation for a more general formalism comes from wave mechanics’ 
conceptual inability to describe elementary particles’ spins! and other internal quantum degrees of 
freedom, such as quark flavors or lepton numbers. In this context, let us review the basic facts on spin 
(which is very representative and experimentally the most accessible of all internal quantum numbers), 
to understand what a more general formalism has to explain — as a minimum. 


Figure 1 shows the conceptual scheme of the simplest spin-revealing experiment, first conceived 
by Otto Stern in 1921 and implemented by Walther Gerlach in 1922. A collimated beam of electrons 
from a natural source, such as a heated cathode, is passed through a gap between the poles of a strong 
magnet, whose magnetic field &, (in Fig. 1, directed along the z-axis) is nonuniform, so that both %, 
and d4-_/dz are not equal to zero. The experiment shows that the beam splits into two beams of equal 
intensity. 


collimator magnet 


Zz 
A | L, y N HI W = 50% 
a ee ee 
Section | B oe. #0 | m5) W=50% Fig. 4.1. The simplest Stern- 
” 


Gerlach experiment. 
oe particle detectors 


This result may be semi-quantitatively explained on classical if somewhat phenomenological 
grounds, by assuming that each electron has an intrinsic, permanent magnetic dipole moment m. Indeed, 
classical electrodynamics? tells us that the potential energy U of a magnetic dipole in an external 
magnetic field B is equal to (-m - &), so that the force acting on the electron, 


F=-VU =-V(-m-&), (4.3) 
has a non-zero vertical component 
F,=-£(-m,-A)=m, (4.4) 
Oz Oz 


Hence if we further assume that electron’s magnetic moment may take only two equally probable 
discrete values of m, = tu (though such discreteness does not follow from any classical model of the 
particle), this may explain the original Stern-Gerlach effect qualitatively. The quantitative explanation 
of the beam splitting angle requires the magnitude of w to be equal (or very close) to the so-called Bohr 


magneton 


jigs ch ao. (4.5) 
2m T 


e 


! To the best of my knowledge, the concept of spin as a measure of the internal rotation of a particle was first 
suggested by Ralph Kronig, then a 20-year-old student, in January 1925, a few months before two other students, 
G. Uhlenbeck and S. Goudsmit — to whom the idea is usually attributed. The concept was then accepted (rather 
reluctantly) and developed quantitatively by Wolfgang Pauli. 

2 See, e.g., EM Sec. 5.4, in particular Eq. (5.100). 

3 A good mnemonic rule is that it is close to 1 K/T. In the Gaussian units, up = he/2m.c ~ 0.9274x 10 erg/G. 
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However, as we will see below, this value cannot be explained by any internal motion of the 
electron, say its rotation about the z-axis. More importantly, this semi-classical phenomenology cannot 
explain, even qualitatively, other experimental results, for example those of the set of multi-stage Stern- 
Gerlach experiments shown in Fig. 2. In the first of the experiments, the electron beam is first passed 
through a magnetic field (and its gradient) oriented along the z-axis, just as in Fig. 1. Then one of the 
two resulting beams is absorbed (or removed from the setup in some other way), while the other one is 
passed through a similar but x-oriented field. The experiment shows that this beam is split again into two 
components of equal intensity. A classical explanation of this experiment would require an even more 
unnatural additional assumption that the initial electrons had random but discrete components of the 
magnetic moment simultaneously in two directions, z and x. 


absorber 


Fig. 4.2. Three multi-stage 
Stern-Gerlach experiments. 
The boxes SG (...) denote 
magnets similar to one 
shown in Fig. 1, with the 
field oriented in _ the 
indicated direction. 


However, even this assumption cannot explain the results of the three-stage Stern-Gerlach 
experiment shown on the middle panel of Fig. 2. Here, the previous two-state setup is complemented 
with one more absorber and one more magnet, now with the z-orientation again. Completely counter- 
intuitively, it again gives two beams of equal intensity, as if we have not yet filtered out the electrons 
with m, corresponding to the lower beam, at the first z-stage. The only way to save the classical 
explanation here is to say that maybe, electrons somehow interact with the magnetic field so that the x- 
polarized (non-absorbed) beam becomes spontaneously depolarized again somewhere between the two 
last stages. But any hope for such an explanation is ruined by the control experiment shown on the 
bottom panel of Fig. 2, whose results indicate that no such depolarization happens. 


We will see below that all these (and many more) results find a natural explanation in the so- 
called matrix mechanics pioneered by Werner Heisenberg, Max Born, and Pascual Jordan in 1925. 
However, the matrix formalism is rather inconvenient for the solution of most problems discussed in 
Chapters 1-3, and for a short time, it was eclipsed by E. Schrédinger’s wave mechanics, which had been 
put forward just a few months later. However, very soon Paul Adrien Maurice Dirac introduced a more 
general bra-ket formalism of quantum mechanics, which provides a generalization of both approaches 
and proves their equivalence. Let me describe it, begging for the reader’s patience, because (in a 
contrast with my usual style), I will not be able to give particular examples of its application for a while 
—until all the basic notions of the formalism have been introduced. 
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4.2. States, state vectors, and linear operators 


The basic notion of the general formulation of quantum mechanics is the quantum state of a 
system.* To get some gut feeling of this notion, if a quantum state @ of a particle may be adequately 
described by wave mechanics, this description is given by the corresponding wavefunction ‘V(r, 7). 
Note, however, a quantum state as such is not a mathematical object,> and can participate in 
mathematical formulas only as a “label” — e.g., the index of the wavefunction Y,. On the other hand, 
such wavefunction is not a state, but a mathematical object (a complex function of space and time) 
giving a quantitative description of the state — just as the classical radius vector rz and velocity Vg as 
real functions of time are mathematical objects describing the motion of the particle in its classical 
description — see Fig. 3. Similarly, in the Dirac formalism, a certain quantum state @ is described by 
either of two mathematical objects, called the state vectors: the ket-vector |a ) and bra-vector (a |,° 
whose relationship is close to that between the wavefunction Y, and its complex conjugate Py. 


classical mechanics: r,(¢), v,,(t), ete. 
system in 
state a 


mathematical 


: : * 
one —-» wavemechanics: either ¥, (r,t) or ¥, (1,0) 
descriptions: 


——~ bra-ket formalism: either | a) or (a | 
Fig. 4.3. Physical state of a system and its descriptions. 


One should be cautious with the term “vector” here. The usual geometric vectors, such as r and 
v, are defined in the usual geometric (say, Euclidean) space. In contrast, the bra- and ket-vectors are 
defined in a more abstract Hilbert space — the full set of its possible bra- and ket-vectors of a given 
system.’ So, despite certain similarities with the geometric vectors, the bra- and ket-vectors are different 
mathematical objects, and we need to define the rules of their handling. The primary rules are essentially 
postulates and are justified only by the correct description of all experimental observations of the rule 
corollaries. While there is a general consensus among physicists what the corollaries are, there are many 
possible ways to carve from them the basic postulate sets. Just as in Sec. 1.2, I will not try too hard to 
beat the number of the postulates to the smallest possible minimum, trying instead to keep their physical 
meaning transparent. 


(i) Ket-vectors. Let us start with ket-vectors — sometimes called just kets for short. Their most 
important property is the /inear superposition. Namely, if several ket-vectors |q) describe possible 
states of a quantum system, numbered by the index /, then any linear combination (superposition) 


(4.6) 


4 An attentive reader could notice my smuggling the term “system” instead of “particle”, which was used in the 
previous chapters. Indeed, the bra-ket formalism allows the description of quantum systems much more complex 
than a single spinless particle that is a typical (though not the only possible) subject of wave mechanics. 

> As was expressed nicely by Asher Peres, one of the pioneers of the quantum information theory, “quantum 
phenomena do not occur in the Hilbert space, they occur in a laboratory”. 

6 The terms bra and ket were suggested to reflect the fact that the pair (2 | and la) may be considered as the parts 
of the combinations like (f | a (see below), which remind expressions in the usual angle brackets. 

7 T have to confess that this is a bit loose definition; it will be refined soon. 
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where c; are any (possibly complex) c-numbers, also describes a possible state of the same system.’ 
Actually, since ket-vectors are new mathematical objects, the exact meaning of the right-hand side of 
Eq. (6) becomes clear only after we have postulated the following rules of summation of these vectors, 


ja;)+|a,)=|a,)+|a,}, (4.7) 
and their multiplication by an arbitrary c-number: 
6|ai,=|a,)e- (4.8) 


Note that in the set of wave mechanics postulates, the statements parallel to Eqs. (7) and (8) were 
unnecessary, because the wavefunctions are the usual (albeit complex) functions of space and time, and 
we know from the usual algebra that such relations are indeed valid. 


As evident from Eq. (6), the complex coefficient c; may be interpreted as the “weight” of the 
state qa; in the linear superposition a. One important particular case is cj = 0, showing that the state a; 
does not participate in the superposition @. The corresponding term of the sum (6), i.e. the product 


Ola,). (4.9) 


has a special name: the null-state vector. (It is important to avoid confusion between the null-state 
corresponding to vector (9), and the ground state of the system, which is frequently denoted by ket- 
vector |0). In some sense, the null-state does not exist at all, while the ground state not only does exist 
but frequently is the most important quantum state of the system.) 


(11) Bra-vectors and inner products. Bra-vectors (@|, which obey the rules similar to Eqs. (7) and 
(8), are not new, independent objects: a ket-vector |q@ ) and the corresponding bra-vector (@| describe 
the same state. In other words, there is a unique dual correspondence between |q@) and (al,° very similar 
(though not identical) to that between a wavefunction Y and its complex conjugate ‘Y*. The 
correspondence between these vectors is described by the following rule: if a ket-vector of a linear 
superposition is described by Eq. (6), then the corresponding bra-vector is 


(a|= Ye; (a,|=d(@, c,. (4.10) 


The mathematical convenience of using two types of vectors rather than just one becomes clear 
from the notion of their inner product (due to its second, shorthand form, also called the short bracket): 


((2))z))= (Bla), (4.11) 


which is a scalar c-number, in a certain but limited analogy with the scalar product of the usual 
geometric vectors. (For one difference, the product (11) may be a complex number.) 


The main property of the inner product is its linearity with respect to any of its component 
vectors. For example, if a linear superposition @ is described by the ket-vector (6), then 


8 One may express the same statement by saying that the vector |a) belongs to the same Hilbert space as all |). 
9 Mathematicians like to say that the ket- and bra-vectors of the same quantum system are defined in two 
isomorphic Hilbert spaces. 
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(B|a)= Sie, {Bla,), (4.12) 
while if Eq. (10) is true, then 
(|B) = Src; (a, |B). (4.13) 


In plain English, c-number factors may be moved either into or out of the inner products. 


The second key property of the inner product is 


Inner 


* duct:: 
(2|8) = (Bla) - (4.14) Comex 
conjugate 
It is compatible with Eq. (10); indeed, the complex conjugation of both parts of Eq. (12) gives: 
* 
* * * 
(B\a) = Ye; (Ala;) = Ye; (a,|B)= (a1). (4.15) 
J J 
Finally, one more rule: the inner product of the bra- and ket-vectors describing the same state 
(called the norm squared) is real and non-negative, 
2 State’s 
|a| =(ala)=0. (4.16) norm 
squared 


In order to give the reader some feeling about the meaning of this rule: we will see below that if some 
state a may be described by the wavefunction ’,(r, #), then 


(a|a) = [Wl a'r 20. (4.17) 


Hence the role of the bra- and ket-vectors of the same state is very similar to that of complex-conjugate 
pairs of its wavefunctions. 


(111) Operators. One more key notion of the Dirac formalism is quantum-mechanical /inear 
operators. Just as for the operators discussed in wave mechanics, the function of an operator is to 


“generate” of one state from another: if |@) is a possible ket of the system, and A is a legitimate! 
operator, then the following combination, 


Ala), (4.18) 


is also a ket-vector describing a possible state of the system, i.e. a ket-vector in the same Hilbert space 
as the initial vector |@). An alternative formulation of the same rule is the following clarification of the 
notion of the Hilbert space: for the given set of linear operators of a system, its Hilbert state includes all 
vectors that may be obtained from each other using the operations of the type (18). In this context, let 
me note that the operator set, and hence the Hilbert space of a system, usually (if not always) implies its 
certain approximate model. For example, if the coupling of orbital degrees of freedom of a particle to its 
spin may be ignored (as it may be for a non-relativistic particle in the absence of an external magnetic 


10 Here the term “legitimate” means “having a clear sense in the bra-ket formalism”. Some examples of 
“illegitimate” expressions are: |@) A ; A (a, |@)|B), and (a\(Z). Note, however, that the last two expressions may be 
legitimate if @ and f are states of different systems, i.e. if their state vectors belong to different Hilbert spaces. 
We will run into such direct products of the bra- and ket-vectors (sometimes denoted, respectively, as |@)®|/) and 
(a|®@¢f)) in Chapters 6-10. 
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field), we may describe the dynamics of the particle using spin operators only. In this case, the set of all 
possible spin vectors of the particle forms a Hilbert space separate from that of the orbital-state vectors 
of that particle. 


As the adjective “linear” in the operator definition implies, the main rules governing the 
operators is their linearity with respect to both any superposition of vectors: 


4 Zela,)]- Leda) (4.19) 
Fi j 
and any superposition of operators: 


[S<,4, la =) GA): (4.20) 


These rules are evidently similar to Eqs. (1.53)-(1.54) of wave mechanics. 


The above rules imply that an operator “acts” on the ket-vector on its right; however, a 
combination of the type (a|A is also legitimate and represents a new bra-vector. It is important that, 


generally, this vector does not represent the same state as the ket-vector (18); instead, the bra-vector 


isomorphic to the ket-vector (18) is 
(a|A". (4.21) 


This statement serves as the definition of the Hermitian conjugate (also called “Hermitian 


adjoint’’) Al of the initial operator A. For an important class of operators, called the Hermitian 
operators, the conjugation is inconsequential, i.e. for them 


(This equality, as well as any other operator equation below, means that these operators act similarly on 
any bra- or ket-vector of the given Hilbert space.) !! 


To proceed further, we need one more additional postulate, sometimes called the associative 
axiom of multiplication: just as an ordinary product of scalars, any legitimate bra-ket expression, not 
including explicit summations, does not change from an insertion or removal of a pair of parentheses — 
meaning as usual that the operation inside them has to be performed first. The first two examples of this 
postulate are given by Eqs. (19) and (20), but the associative axiom is more general and means, for 
example, that 


(A|(4\a))=((4|4)\a) = (4|4lq), (4.23) 


This last equality serves as the definition of the last form, called the long bracket (evidently, also a 
scalar), with an operator sandwiched between a bra-vector and a ket-vector. This definition, when 
combined with the definition of the Hermitian conjugate and Eq. (14), yields an important corollary: 


'1 If we consider c-numbers as a particular type of operators (which is legitimate for any Hilbert space), then 
according to Eqs. (11) and (21), for them the Hermitian conjugation is equivalent to the simple complex 
conjugation, so that only real c-numbers may be considered as a particular type of the Hermitian operators (22). 
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(BlAla)= (A|(aja))=[((ala" J))"=(alatlayy. (4.24 


which is most frequently rewritten as 


L 
— ae 
(a A B) = (B A a). (4.25) complex 
conjugate 
The associative axiom also enables us to comprehend the following definition of one more, outer 
product of bra- and ket-vectors: facaee 
bra-ket 
|A\a | ° ih20) rahi 
In contrast to the inner product (11), which is a scalar, this mathematical construct is an operator. 
Indeed, the associative axiom allows us to remove parentheses in the following expression: 
(4){a)\7)=|4)(al7). (4.27) 
But the last short bracket is just a scalar; hence the mathematical object (26), acting on a ket-vector (in 
this case, |7)), gives a new ket-vector, which is the essence of the operator’s action. Very similarly, 
(5\(BXa))= (6|4)(a| (4.28) 
— again a typical operator’s action on a bra-vector. So, Eq. (26) defines an operator. 
Now let us perform the following calculation. We may use the parentheses’ insertion into the 
bra-ket equality following from Eq. (14), 
* 
(v|@\(A|8) = ((8|4)a|7)) (4.29) 
to transform it to the following form: 
* 
(r|(leapie)=(ell4){a ir)". (4.30) 
Since this equality should be valid for any state vectors (7 and | 4), its comparison with Eq. (25) gives 
the following operator equality - Cuter 
= duct: 
(a)(\)" =|4)ar. oD) Tsai 
conjugate 


This is the conjugate rule for outer products; it reminds the rule (14) for inner products but involves the 
Hermitian (rather than the usual complex) conjugation. 


The associative axiom is also valid for the operator “multiplication”: 


(48) 2) = A(Bja)) — (a|(48)=((4|4)2, (4.32) 


showing that the action of an operator product on a state vector is nothing more than the sequential 
action of its operands. However, we have to be careful with the operator products; generally, they do not 


commute: AB # BA. This is why the commutator — the operator defined as 


(4.33) Commutator 
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is a non-trivial and very useful notion. Another similar notion is the anticommutator:'* 


\4,B}= 4B + Bd. (4.34) 


Finally, the bra-ket formalism broadly uses two special operators. The null-operator 0 is 
defined by the following relations: 


Ola)=Oa), (al0=(al0, (4.35) 


where @ is an arbitrary state; we may say that the null-operator “kills” any state, turning it into the null- 
state. Another useful notion is the identity operator, which is defined by the following action (or rather 
“inaction” :-) on an arbitrary state vector: 


lja)=|a), (ali =(al. (4.36) 


These definitions show that the null-operator and the identity operator are Hermitian. 


4.3. State basis and matrix representation 


While some operations in quantum mechanics may be carried out in the general bra-ket 
formalism outlined above, many calculations are performed for quantum systems that feature a full and 
orthonormal set {u} = {u1, U2, ..., Uj, ...} Of its states u;, frequently called a basis. The first of these 
terms means that any possible state vector of the system (i.e. of its Hilbert space) may be represented as 
a unique sum of the type (6) or (10) over its basis vectors: 


(4.37) 


(4.38) 


For the systems that may be described by wave mechanics, examples of the full orthonormal bases are 
represented by any full and orthonormal set of eigenfunctions calculated in the previous three chapters 
of this course — for the simplest example, see Eq. (1.87). 


Due to the uniqueness of the expansion (37), the full set of the coefficients a involved in the 
expansion of a state @ in certain basis {u} gives its complete description — just as the Cartesian 
components A,, A,, and A, of a usual geometric 3D vector A in certain reference frame give its complete 
description. Still, let me emphasize some differences between such representations of the quantum- 
mechanical state vectors and 3D geometric vectors: 


(i) a quantum state basis may have a large or even infinite number of states u;, and 


(ii) the expansion coefficients a may be complex. 


!2 Another popular notation for the anticommutator (34) is (4, Bl, ; it will not be used in these notes. 
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With these reservations in mind, the analogy with geometric vectors may be pushed further on. 
Let us inner-multiply both parts of the first of Eqs. (37) by a bra-vector (u;| and then transform the 
resulting relation using the linearity rules discussed in the previous section, and Eq. (38): 


(22) a) 2) = a ee (4.39) 


Together with Eq. (14), this means that any of the expansion coefficients in Eq. (37) may be represented 
as an inner product: 


Expansion 
coefficients 
(4.40) as inner 
products 
these important equalities relations are analogs of equalities A; = n;-A of the usual vector algebra, and 
will be used on numerous occasions in this course. With them, the expansions (37) may be rewritten as 
= Diu; a, |a) = a), (a= a7 Jus )(u, i> (alA,, (4.41) 
i 
where 
A. =lu. ices 4.42 Projection 
i e, Me, | ( ) operator 


Eqs. (41) show that A ; 80 defined is a legitimate linear operator. This operator, acting on any state 


vector of the type (37), singles out just one of its components, for example, 
A ,|a)=|u,\(u,|a)=a,|u,), (4.43) 


i.e. “kills” all components of the linear superposition but one. In the geometric analogy, such operator 
“projects” the state vector on the j" “direction”, hence its name — the projection operator. Probably, the 

most important property of the projection operators, called the closure (or “completeness’”) relation, 
immediately follows from Eq. (41): their sum over the full basis is equivalent to the identity operator 

(4 44) Closure 


relation 


This means in particular that we may insert the left-hand side of Eq. (44), for any basis, into any bra-ket 
relation, at any place — the trick that we will use again and again. 


Now let us see how the expansions (37) transform the key notions introduced in the last section, 
starting from the short bracket (11), i.e. the inner product of two state vectors: 


(Bla) => (u,|Ba,|u,)= L848 y => Ba, (4.45) 


Ad 


Besides the complex conjugation, this expression is similar to the scalar product of the usual, geometric 
vectors. Now, let us explore the long bracket (23): 


(Ala) = 318; (u,|4 


Here, the last step uses the very important notion of matrix elements of the operator, defined as 


* 
Uy) = Bj A yy. (4.46) 
ded 
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(4.47) 


As evident from Eq. (46), the full set of the matrix elements completely characterizes the operator, just 
as the full set of the expansion coefficients (40) fully characterizes a quantum state. The term “matrix” 
means, first of all, that it is convenient to represent the full set of A; as a square table (matrix), with the 
linear dimension equal to the number of basis states u; of the system under the consideration. By the 
way, this number (which may be infinite) is called the dimensionality of its Hilbert space. 


As two simplest examples, all matrix elements of the null-operator, defined by Eqs. (35), are 
evidently equal to zero (in any basis), and hence it may be represented as a matrix of zeros (called the 
null-matrix): 


(4.48) 
while for the identity operator I , defined by Eqs. (36), we readily get 
Ty =(u,|flup) =(u;|up) =, (4.49) 
i.e. its matrix (naturally called the identity matrix) is diagonal — also in any basis: 
(4.50) 


The convenience of the matrix language extends well beyond the representation of particular 
operators. For example, let us use the definition (47) to calculate matrix elements of a product of two 
operators: 


(AB) jw = (uw, |ABlu j»). (4.51) 


Here we may use Eq. (44) for the first (but not the last!) time, inserting the identity operator between the 
two operators, and then expressing it via a sum of projection operators: 


(AB) jp. = (u, [AB|u jo) = (u, [AIB| ue») = Do (ue [Ale ue [Bln = DA Bi | (4-52) 
J J 


This result corresponds to the standard “row by column” rule of calculation of an arbitrary element of 


the matrix product 
A, 1 A, aa B, 1 B, 2 


AB=| 4, 4, ..[B, By |. (4.53) 


Hence a product of operators may be represented (in a fixed basis!) by that of their matrices (in the same 
basis). 


This is so convenient that the same language is often used to represent not only long brackets, 
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_ Ai, e — 
bracket 
(B|A\a) > BA yar, -(4 ‘Be | Ay, i (4.54) Va 
Gh: Suh wid as pene 
but even short brackets: 
, — Short 
= = bracket 
(Bla)=> pra, =(4 is) ee (4.55) as a matrix 
J product 
although these equalities require the use of non-square matrices: rows of (complex-conjugate!) 
expansion coefficients for the representation of bra-vectors, and columns of these coefficients for the 
representation of ket-vectors. With that, the mapping of quantum states and operators on matrices 
becomes completely general. 
Now let us have a look at the outer product operator (26). Its matrix elements are just 
* 
(a)(A)),, =(u,|a\(Blu.) = 2,8). (4.56) 
These are the elements of a very special square matrix, whose filling requires the knowledge of just 2NV 
scalars (where N is the basis size), rather than N’ scalars as for an arbitrary operator. However, a simple 
generalization of such an outer product may represent an arbitrary operator. Indeed, let us insert two 
identity operators (44), with different summation indices, on both sides of an arbitrary operator: 
i= iti -(Slo le Nala, } (4.57) 
j J 
and then use the associative axiom to rewrite this expression as 
=> u,\((u, [A]uu,-) Ku |. (4.58) 
deff 
But the expression in the middle long bracket is just the matrix element (47), so that we may write 
Operator 
via its 
(4.59) matrix 
elements 


The reader has to agree that this formula, which is a natural generalization of Eq. (44), is extremely 
elegant. 


The matrix representation is so convenient that it makes sense to extend it to one level lower — 
from state vector products to the “bare” state vectors resulting from the operator’s action upon a given 
state. For example, let us use Eq. (59) to represent the ket-vector (18) as 


la’) = dla) -[D hs ble Na) = S)e)Ar(u-la) (4.60 


According to Eq. (40), the last short bracket is just a, so that 


)= Zhe dave, = D[ Eye) (461) 
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But the expression in the parentheses is just the coefficient a’; of the expansion (37) of the resulting ket- 
vector (60) in the same basis, so that 


G = DA pe; ; (4.62) 
F, 
This result corresponds to the usual rule of multiplication of a matrix by a column, so that we may 
represent any ket-vector by its column matrix, with the operator’s action looking like 


a Ay, Ay 2) 
G',.|\=| Ay Ag ag |e |x (4.63) 


Absolutely similarly, the operator action on the bra-vector (21), represented by its row-matrix, is 


ae 
(oF...) (afc...) (47) (4t) ale (4.64) 


By the way, Eq. (64) naturally raises the following question: what are the elements of the matrix 
on its right-hand side, or more exactly, what is the relation between the matrix elements of an operator 
and its Hermitian conjugate? The simplest way to answer it is to use Eq. (25) with two arbitrary states 
(say, u; and u;’) of the same basis in the role of @ and & Together with the orthonormality relation (38), 
this immediately gives!3 


(4.65) 


Thus, the matrix of the Hermitian-conjugate operator is the complex conjugated and transposed matrix 
of the initial operator. This result exposes very clearly the difference between the Hermitian and the 
complex conjugation. It also shows that for the Hermitian operators, defined by Eq. (22), 


Pa 


A,,=A 


Ji pS? 


(4.66) 


i.e. any pair of their matrix elements, symmetric with respect to the main diagonal, should be the 
complex conjugate of each other. As a corollary, their main-diagonal elements have to be real: 


* 3 
A,=A,, ie. ImA, =0. (4.67) 


13 For the sake of formula compactness, below I will use the shorthand notation in that the operands of this 
equality are just 4‘; and A*;,;. I believe that it leaves little chance for confusion, because the Hermitian 
conjugation sign + may pertain only to an operator (or its matrix), while the complex conjugation sign *, to a 
scalar — say a matrix element. 
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In order to fully appreciate the special role played by Hermitian operators in quantum theory, let 
us introduce the key notions of eigenstates a; (described by their eigenvectors (a;| and |a;)) and 


eigenvalues (c-numbers) A; of an operator A , both defined by the equation they have to satisfy:!4 
Ala,)= A,|a,). (4.68) 
Let us prove that eigenvalues of any Hermitian operator are real,!° 


* . 
A,=A,, for j=1,2,...N, (4.69) 


while the eigenstates corresponding to different eigenvalues are orthogonal: 


(a,|a,)=0, if A, # A). (4.70) 


The proof of both statements is surprisingly simple. Let us inner-multiply both sides of Eq. (68) 
by the bra-vector (a;|. On the right-hand side of the result, the eigenvalue A;, as a c-number, may be 
taken out of the bracket, giving 


(a,,|Ala,)=A,(a,|a,)}. (4.71) 


This equality has to hold for any pair of eigenstates, so that we may swap the indices in Eq. (71), and 
write the complex-conjugate of the result: 


(4; 
Now using Eqs. (14) and (25), together with the Hermitian operator’s definition (22), we may transform 
Eq. (72) into the following form: 
io 


Subtracting this equation from Eq. (71), we get 


0= (4, =A, \a, \a;). (4.74) 


An 


Ala,) =A;la,|a,) . (4.72) 


n 


Ala,\=4,(a,|a,). (4.73) 


There are two possibilities to satisfy this relation. If the indices j and 7’ are equal (denote the 
same eigenstate), then the bracket is the state’s norm squared, and cannot be equal to zero. In this case, 
the left parentheses (with 7 = j’) have to be zero, proving Eq. (69). On the other hand, if 7 and /’ 
correspond to different eigenvalues of A, the parentheses cannot equal zero (we have just proved that all 
A; are real!), and hence the state vectors indexed by j and j’ should be orthogonal, e.g., Eq. (70) is valid. 


As will be discussed below, these properties make Hermitian operators suitable, in particular, for 
the description of physical observables. 


!4 This equation should look familiar to the reader — see the stationary Schrédinger equation (1.60), which was the 
focus of our studies in the first three chapters. We will see soon that that equation is just a particular (coordinate) 
representation of Eq. (68) for the Hamiltonian as the operator of energy. 

'5 The reciprocal statement is also true: if all eigenvalues of an operator are real, it is Hermitian (in any basis). 
This statement may be readily proved by applying Eq. (93) below to the case when Aj, = AgOue, With Ay* = Ay. 
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4.4. Change of basis, and matrix diagonalization 


From the discussion of the last section, it may look that the matrix language is fully similar to, 
and in many instances more convenient than the general bra-ket formalism. In particular, Eqs. (54)-(55) 
and (63)-(64) show that any part of any bra-ket expression may be directly mapped on the similar matrix 
expression, with the only slight inconvenience of using not only columns but also rows (with their 
elements complex-conjugated), for state vector representation. This invites the question: why do we 
need the bra-ket language at all? The answer is that the elements of the matrices depend on the particular 
choice of the basis set, very much like the Cartesian components of a usual geometric vector depend on 
the particular choice of reference frame orientation (Fig. 4), and very frequently, at problem solution, it 
is convenient to use two or more different basis sets for the same system. (Just a bit more patience — 
numerous examples will follow soon.) 


Fig. 4.4. The transformation 
of components of a 2D vector 
at a reference frame’s rotation. 


With this motivation, let us explore what happens at the transform from one basis, {u}, to 
another one, {v} — both full and orthonormal. First of all, let us prove that for each such pair of bases, 


and an arbitrary numbering of the states of each base, there exists such an operator U that, first, 


n 


ye Ulu,), (4.75) 


| Vi 


out =utd =F. (4.76) 


(Due to the last property,!® U is called a unitary operator, and Eq. (75), a unitary transformation.) 


and, second, 


A very simple proof of both statements may be achieved by construction. Indeed, let us take 


(4.77) 


- an evident generalization of Eq. (44). Then, using Eq. (38), we obtain 
uj) = Drv ler es) = Dry) Oi =|v,), (4.78) 
J J 


so that Eq. (75) has been proved. Now, applying Eq. (31) to each term of the sum (77), we get 


U 


(4.79) 


16 An alternative way to express Eq. (76) is to write U _ oe , but I will try to avoid this language. 
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so that 


UU" = pa, Meer vel = DI, )5e¥s|= >)» Xvi|- oe 


But according to the closure relation (44), the last expression is just the identity operator, so that one of 
Eqs. (76) has been proved. (The proof of the second equality is absolutely similar.) As a by-product of 
our proof, we have also got another important expression — Eq. (79). It implies, in particular, that while, 


according to Eq. (75), the operator 6! performs the transform from the “old” basis u; to the “new” basis 


v;, its Hermitian adjoint ut performs the reciprocal transform: 


Reciprocal 
+ - _ 
U"|v,)= des) Fn =|x;). CoD as 
Now let us see how do the matrix elements of the unitary transform operators look like. 
Generally, as was discussed above, the operator’s elements may depend on the basis we calculate them 
in, so let us be specific — at least initially. For example, let us calculate the desired matrix elements Uj; 
in the “old” basis {wu}, using Eq. (77): 
Gilt Olea ate bi os =(ulE roy = (uve) 4.82 
J J 
Now performing a similar calculation in the “new” basis {v}, we get 
Uylar= (0b, =6y, [Shore Ne -¥8,-(u,[v,)=(u))vy) (4.83) 
J J 
Surprisingly, the result is the same! This is of course true for the Hermitian conjugate (79) as well: 
ut, inu — ut. inv — (v, |x). (4.84) 
These expressions may be used, first of all, to rewrite Eq. (75) in a more direct form. Applying 
the first of Eqs. (41) to any state v; of the “new” basis, and then Eq. (82), we get 
ye) = di] Mey |vp) = dU lee, ) . (4.85) . 
j Fi Basis 
te . ’ transforms: 
Similarly, the reciprocal transform is matrix 
form 


(4.86) 


These formulas are very convenient for applications; we will use them already in this section. 


Next, we may use Eqs. (83)-(84) to express the effect of the unitary transform on the expansion 
coefficients aj of the vectors of an arbitrary state a, defined by Eq. (37). As a reminder, in the “old” 
basis {u} they are given by Eqs. (40). Similarly, in the “new” basis {v}, 


wv = (v,|@). (4.87) 


Again inserting the identity operator in its closure form (44) with the internal index 7’, and then using 
Eqs. (84) and (40), we get 


ae 


Chapter 4 Page 16 of 52 


Matrix 
elements’ 
transforms 


Essential Graduate Physics QM: Quantum Mechanics 


hue Gol Ele Ner|}e)= Lolo Merla)=Deblela)=Teherae 89 
The reciprocal transform is performed by matrix elements of the operator U: 
a = U9, er (4.89) 


So, if the transform (75) from the “old” basis {u} to the “new” basis {v} is performed by a 
unitary operator, the change (88) of state vectors components at this transformation requires its 
Hermitian conjugate. This fact is similar to the transformation of components of a usual vector at 
coordinate frame rotation. For example, for a 2D vector whose actual position in space is fixed (Fig. 4): 


a,'\ ( cosp sing\ a, (4.90) 
a,’ 7 —sing cosp}a, : 


but the reciprocal transform is performed by a different matrix, which may be obtained from that 
participating in Eq. (90) by the replacement g — —q. This replacement has a clear geometric sense: if 
the “new” reference frame {x’, y’} is obtained from the “old” frame {x, y} by a counterclockwise 
rotation by angle g, the reciprocal transformation requires angle —g. (In this analogy, the unitary 
property (76) of the unitary transform operators corresponds to the equality of the determinants of both 
rotation matrices to 1.) 


Due to the analogy between expressions (88) and (89) on one hand, and our old friend Eq. (62) 
on the other hand, it is tempting to skip indices in these new results by writing 
ja). =U'la), — |a). =Ola)._. (SYMBOLIC ONLY!) (4.91) 
Since the matrix elements of U and U : do not depend on the basis, such language is not too bad and is 
mnemonically useful. However, since in the bra-ket formalism (or at least its version presented in this 
course), the state vectors are basis—independent, Eq. (91) has to be treated as a symbolic one, and should 


not be confused with the strict Eqs. (88)-(89), and with the rigorous basis-independent vector and 
operator equalities discussed in Sec. 2. 


Now let us use the same trick of identity operator insertion, repeated twice, to find the 
transformation rule for matrix elements of an arbitrary operator: 


Arua ™ (Alyy) =( [Dieu ]4[ Doe Neel or) = Leh’ 


absolutely similarly, we may also get 


Aj) inu - > Uta aU . (4.93) 
k,k' 


In the spirit of Eq. (91), we may represent these results symbolically as well, in a compact form: 


Al, =U'A|, U, A ,vU'!. (SYMBOLIC ONLY!) (4.94) 


mv 


in uo 


As a sanity check, let us apply Eq. (93) to the identity operator: 
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im =(GG) =(0G) =F] 


- as it should be. One more (strict rather than symbolic) invariant of the basis change is the trace of any 
operator, defined as the sum of the diagonal terms of its matrix: 


(4.95) 


Tr A=TrA=>.4,. (4.96) 


J 


The (easy) proof of this fact, using previous relations, is left for the reader’s exercise. 


So far, I have implied that both state bases {uw} and {v} are known, and the natural question is 
where does this information come from in quantum mechanics of actual physical systems. To get a 
partial answer to this question, let us return to Eq. (68), which defines the eigenstates and the 
eigenvalues of an operator. Let us assume that the eigenstates a; of a certain operator A form a full and 
orthonormal set, and calculate the matrix elements of the operator in the basis {a} of these states, at 
their arbitrary numbering. For that, it is sufficient to inner-multiply both sides of Eq. (68), written for 
some index j’, by the bra-vector of an arbitrary state a; of the same set: 


(a,|4la,)=(a,|4;|a;). (4.97) 


The left-hand side of this equality is the matrix element A; we are looking for, while its right-hand side 
is just A; dj. As a result, we see that the matrix is diagonal, with the diagonal consisting of the 
operator’s eigenvalues: 
(4.98) 


In particular, in the eigenstate basis (but not necessarily in an arbitrary basis!), 4;; means the same as 4). 
Thus the important problem of finding the eigenvalues and eigenstates of an operator is equivalent to the 
diagonalization of its matrix,!” i.e. finding the basis in which the operator’s matrix acquires the diagonal 
form (98); then the diagonal elements are the eigenvalues, and the basis itself is the desirable set of 
eigenstates. 


To see how this is done in practice, let us inner-multiply Eq. (68) by a bra-vector of the basis 
(say, {u}) in that we have happened to know the matrix elements Aj: 


(u, |Ala,) =(u, [4,]a;). (4.99) 


On the left-hand side, we can (as usual :-) insert the identity operator between the operator A and the 
ket-vector, and then use the closure relation (44) in the same basis {uw}, while on the right-hand side, we 
can move the eigenvalue A; (a c-number) out of the bracket, and then insert a summation over the same 
index as in the closure, compensating it with the proper Kronecker delta symbol: 


(uy |AD uy up ay) = 4; > (Ue [aj ) Oe (4.100) 
kK k' 


Moving out the signs of summation over k’, and using the definition (47) of the matrix elements, we get 


'7 Note that the expression “matrix diagonalization” is a very common but dangerous jargon. (Formally, a matrix 
is just a matrix, an ordered set of c-numbers, and cannot be “diagonalized”.) It is OK to use this jargon if you 
remember clearly what it actually means — see the definition above. 
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> 4 - 4,5 Key |a,) =0. (4.101) 
But the set of such equalities, for all N possible values of the index k, is just a system of linear, 
homogeneous equations for unknown c-numbers (u;|a;). According to Eqs. (82)-(84), these numbers are 
nothing else than the matrix elements U;; of a unitary matrix providing the required transformation from 
the initial basis {uw} to the basis {a} that diagonalizes the matrix A. This system may be represented in 
the matrix form: 


(4.102) 


and the condition of its consistency, 


(4.103) 


plays the role of the characteristic equation of the system. This equation has N roots A; — the eigenvalues 
of the operator A ; after they have been calculated, plugging any of them back into the system (102), we 
can use it to find N matrix elements Uy (k = 1, 2, ...N) corresponding to this particular eigenvalue. 
However, since the equations (103) are homogeneous, they allow finding Uj, only to a constant 
multiplier. To ensure their normalization, i.e. enforce the unitary character of the matrix U, we may use 
the condition that all eigenvectors are normalized (just as the basis vectors are): 


(a;|a;)= (a; |e las)= L|Usf = 1 (4.104) 


for each 7. This normalization completes the diagonalization. !® 


Now (at last!) I can give the reader some examples. As a simple but very important case, let us 
diagonalize each of the operators described (in a certain two-function basis {uw}, i.e. in two-dimensional 
Hilbert space) by the so-called Pauli matrices 


om “(; 3} om “(| | o.= } (4.105) 


Though introduced by a physicist, with a specific purpose to describe electron’s spin, these matrices 
have a general mathematical significance, because together with the 2x2 identity matrix, they provide a 
full, linearly-independent system — meaning that an arbitrary 2x2 matrix may be represented as 


A, A 
0 | =bl+c,0,+¢,6, +¢,6,, (4.106) 
A, Ay 


18 A possible slight complication here is that the characteristic equation may give equal eigenvalues for certain 
groups of different eigenvectors. In such cases, the requirement of the mutual orthogonality of these degenerate 
states should be additionally enforced. 
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with a unique set of four c-number coefficients b, c,, cy, and cz. 


Since the matrix o-, is already diagonal, with the evident eigenvalues +1, let us start with 
diagonalizing the matrix o,. For it, the characteristic equation (103) is evidently 


—A, 1 


J 


“0, tei 4.107 
t of 0, ie. A; 0, ( ) 


and has two roots, A:,2= +1. (Again, the state numbering is arbitrary!) So the eigenvalues of the matrix 
Ox are the same as of the matrix o-,. (The reader may readily check that the eigenvalues of the matrix o, 
are also the same.) However, the eigenvectors of the operators corresponding to these three matrices are 
different. To find them for o,, let us plug its first eigenvalue, A; = +1, back into equations (101) spelled 
out for this particular case (j = 1; k, k’ = 1,2): 


— (ut, ar) + (us [ay) = 0, 
(u; |@,)— (up |a,) =0. 


These two equations are compatible (of course, because the used eigenvalue A; = +1 satisfies the 
characteristic equation), and any of them gives 


(u, |a,) = (uz |a,), Le. U,, =U,,. (4.109) 


(4.108) 


With that, the normalization condition (104) yields 
1 
Wal’ =|Ual? —— (4.110) 


Although the normalization is insensitive to the simultaneous multiplication of U;; and U2; by the same 
phase factor exp {ig} with any real g, it is convenient to keep the coefficients real, for example taking @ 
= 0, to get 


U,, =U, =——. (4.111) 


Performing an absolutely similar calculation for the second characteristic value, A2 = —1, we get 
U\2 =—U2x, and we may choose the common phase to have 


1 
2 9 


so that the whole unitary matrix for diagonalization of the operator corresponding to 0, is!? 


U,, =Uy = (4.112) 


(4.113) 


For what follows, it will be convenient to have this result expressed in the ket-relation form — see Eqs. 
(85)-(86): 


Ja:)= Une) + alta) =) + |e,)) Jas) =Ura]t4) + Ulta) = (en) —fn)) (4.114a) 


!9 Note that though this particular unitary matrix is Hermitian, this is not true for an arbitrary choice of phases @. 
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' 1 F 1 
|u,) = Uh an)+U3|as)==lla)+|4)) |v.) =Uj,|a,)+U},|a,) =sella:)-la)) (4.114b) 


Now let me show that these results are already sufficient to understand the Stern-Gerlach 
experiments described in Sec. 1 — but with two additional postulates. The first of them is that the 
interaction of a particle with the external magnetic field, besides that due to its orbital motion, may be 
described by the following vector operator of its spin dipole magnetic moment:?° 


where the constant coefficient y, specific for every particle type, is called the gyromagnetic ratio,?! and 
S is the vector operator of spin, with three Cartesian components: 


(4.115b) 


Here n,,,- are the usual Cartesian unit vectors in the 3D geometric space (in the quantum-mechanics 
sense, just c-numbers, or rather “‘c-vectors”), while S. ,. are the “usual” (scalar) operators. 


For the so-called spin-/ particles (including the electron), these components may be simply, as 


(4.116a) 


expressed via those of the Pauli vector operator 6 =n,0., +n,G,, +n_o-,, so that we may also write 
% he. 
y= (4.116b) 


In turn, in the so-called z-basis, each Cartesian component of the latter operator is just the corresponding 
Pauli matrix (105), so that it may be also convenient to use the following 3D vector of these matrices: 


(4.117) 


The z-basis, in which such matrix representation of 6 is valid, is defined as an orthonormal basis 
of certain two states, commonly denoted 7 an J, in that the matrix of the operator 6, is diagonal, with 


eigenvalues, respectively, + 1 and —1, and hence the matrix S, = (f/2)o, of S: is also diagonal, with the 
eigenvalues +h/2 and —h/2. Note that we do not “understand” what exactly the states T and ¥ are,22 but 


20 This was the key point in the electron spin’s description, developed by W. Pauli in 1925-1927. 

?!For the electron, with its negative charge q =—e, the gyromagnetic ratio is negative: vy. = —g,e/2m., where g, = 2 
is the dimensionless g-factor. Due to quantum-electrodynamic (relativistic) effects, this g-factor is slightly higher 
than 2: g. = 2(1 + @2a+ ...) ~ 2.002319304..., where a = e'/42ahic = (Ey/m.c’)'” ~ 1/137 is the so-called fine 
structure constant. (The origin of its name will be clear from the discussion in Sec. 6.3.) 

22 If you think about it, the word “understand” typically means that we can express a new, more complex notion 
in terms of those discussed earlier and considered “known”. In our current case, we cannot describe the spin states 
by some wavefunction y(r), or any other mathematical notion discussed in the previous three chapters. The bra- 
ket formalism has been invented exactly to enable mathematical analyses of such “new” quantum states we do not 
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loosely associate them with some internal rotation of a spin-’2 particle about the z-axis, with either 
positive or negative angular momentum component S,. However, attempts to use such classical 
interpretation for quantitative predictions runs into fundamental difficulties — see Sec. 6 below. 


The second necessary postulate describes the general relation between the bra-ket formalism and 
experiment. Namely, in quantum mechanics, each real observable A is represented by a Hermitian 


operator A - At, and the result of its measurement,?3 in a quantum state @ described by a linear 
superposition of the eigenstates a; of the operator, 


ja)=ia,|a,), with a, =(a,|a), (4.118) 
J 


may be only one of the corresponding eigenvalues 4;.74 Specifically, if the ket (118) and all eigenkets 
|a;) are normalized to 1, 
(aja@)=1, (a, |a,)=1, (4.119) 


then the probability of a certain measurement outcome 4; is*> 
* 
a, =(a|a,\(a,\@), (4.120) 


This relation is evidently a generalization of Eq. (1.22) in wave mechanics. As a sanity check, let us 
assume that the set of the eigenstates a; is full, and calculate the sum of the probabilities to find the 


system in one of these states: 
pas = (ela; )a, |) = (a 
J J 


Now returning to the Stern-Gerlach experiment, conceptually the description of the first (z- 
oriented) experiment shown in Fig. | is the hardest for us, because the statistical ensemble describing 
the unpolarized electron beam at its input is mixed (“incoherent”), and cannot be described by a pure 
(“coherent”) superposition of the type (6) that have been the subject of our studies so far. (We will 
discuss such mixed ensembles in Chapter 7.) However, it is intuitively clear that its results are 
compatible with the description of the two output beams as sets of electrons in the pure states T and J, 
respectively. The absorber following that first stage (Fig. 2) just takes all spin-down electrons out of the 
picture, producing an output beam of polarized electrons in the definite 1 state. For such a beam, the 
probabilities (120) are W+ = 1 and W, = 0. This is certainly compatible with the result of the “control” 
experiment shown on the bottom panel of Fig. 2: the repeated SG (z) stage does not split such a beam, 
keeping the probabilities the same. 


An 


i 


Cee (4.121) 


initially “understand”. Gradually we get accustomed to these notions, and eventually, as we know more and more 
about their properties, start treating them as “known” ones. 

23 Here again, just like in Sec. 1.2, the statement implies the abstract notion of “ideal experiments”, deferring the 
discussion of real (physical) measurements until Chapter 10. 

24 As a reminder, at the end of Sec. 3 we have already proved that such eigenstates corresponding to different 
values A; are orthogonal. If any of these values is degenerate, i.e. corresponds to several different eigenstates, they 
should be also selected orthogonal, in order for Eq. (118) to be valid. 

25 This key relation, in particular, explains the most common term for the (generally, complex) coefficients aj, the 
probability amplitudes. 
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Now let us discuss the double Stern-Gerlach experiment shown on the top panel of Fig. 2. For 
that, let us represent the z-polarized beam in another basis — of the two states (I will denote them as > 
and <-) in that, by definition, the matrix S, is diagonal. But this is exactly the set we called a2 in the o, 
matrix diagonalization problem solved above. On the other hand, the states T and J are exactly what we 
called uw; 2 in that problem, because in this basis, we know matrix o explicitly — see Eq. (117). Hence, in 
the application to the electron spin problem, we may rewrite Eqs. (114) as 


(4.122) 


(4.123) 


Currently for us the first of Eqs. (123) is most important, because it shows that the quantum state 
of electrons entering the SG (x) stage may be represented as a coherent superposition of electrons with 
S, = +h/2 and S, = —h/2. Notice that the beams have equal probability amplitude moduli, so that 
according to Eq. (122), the split beams — and < have equal intensities, in accordance with 
experimental results. (The minus sign before the second ket-vector is of no consequence here, but it may 
have an impact on outcomes of other experiments — for example, if coherently split beams are brought 
together again.) 


Now, let us discuss the most mysterious (from the classical point of view) multi-stage SG 
experiment shown on the middle panel of Fig. 2. After the second absorber has taken out all electrons in, 
say, the < state, the remaining electrons, all in the state —, are passed to the final, SG (z), stage. But 
according to the first of Eqs. (122), this state may be represented as a (coherent) linear superposition of 
the T and J states, with equal probability amplitudes. The final stage separates electrons in these two 
states into separate beams, with equal probabilities W+ = W, = 4 to find an electron in each of them, 
thus explaining the experimental results. 


To conclude our discussion of the multistage Stern-Gerlach experiment, let me note that though 
it cannot be explained in terms of wave mechanics (which operates with scalar de Broglie waves), it has 
an analogy in classical theories of vector fields, such as the classical electrodynamics. Indeed, let a plane 
electromagnetic wave propagate normally to the plane of the drawing in Fig. 5, and pass through the 
linear polarizer 1. 


Fig. 4.5. A light polarization sequence similar to the three-stage 
Stern-Gerlach experiment shown on the middle panel of Fig. 2. 


Similarly to the output of the initial SG (z) stages (including the absorbers) shown in Fig. 2, the 
output wave is linearly polarized in one direction — the vertical direction in Fig. 5. Now its electric field 
vector has no horizontal component — as may be revealed by the wave’s full absorption in a 
perpendicular polarizer 3. However, let us pass the wave through polarizer 2 first. In this case, the 
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output wave does acquire a horizontal component, as can be, again, revealed by passing it through 
polarizer 3. If the angles between the polarization directions | and 2, and between 2 and 3, are both 
equal to 7/4, each polarizer reduces the wave amplitude by a factor of V2, and hence the intensity by a 
factor of 2, exactly like in the multistage SG experiment, with the polarizer 2 playing the role of the SG 
(x) stage. The “only” difference is that the necessary angle is 7/4, rather than by 7/2 for the Stern- 
Gerlach experiment. In quantum electrodynamics (see Chapter 9 below), which confirms classical 
predictions for this experiment, this difference may be interpreted by that between the integer spin of 
electromagnetic field quanta (photons) and the half-integer spin of electrons. 


4.5. Observables: Expectation values and uncertainties 


After this particular (and hopefully inspiring) example, let us discuss the general relation 
between the Dirac formalism and experiment in more detail. The expectation value of an observable 
over any Statistical ensemble (not necessarily coherent) may be always calculated using the general 
statistical rule (1.37). For the particular case of a coherent superposition (118), we can combine that rule 
with Eq. (120) and the second of Eqs. (118): 


(4)= DAM, =SDai aa, =D (a,)4(e,J2)=(al Sle), ia). (4.124) 


Now using Eq. (59) for the particular case of the eigenstate basis {a}, for which Eq. (98) is valid, we 
arrive at a very simple and important formula?° 


(A) =(a|Ala). (4.125) 


This is a clear analog of the wave-mechanics formula (1.23) — and as we will see soon, may be used to 
derive it. A big advantage of Eq. (125) is that it does not explicitly involve the eigenvector set of the 
corresponding operator, and allows the calculation to be performed in any convenient basis.?’ 


For example, let us consider an arbitrary coherent state a of spin-’%,28 and calculate the 
expectation values of its components. The calculations are easiest in the z-basis because we know the 
matrix elements of the spin operator components in that basis. Representing the ket- and bra-vectors of 
the given state as linear superpositions of the corresponding vectors of the basis states T and J, 


|2)=a,|T)+a,|V), (a|= (Naz + (Vay. (4.126) 


and plugging these expressions to Eq. (125) written for the observable S., we get 


26 This equality reveals the full beauty of Dirac’s notation. Indeed, initially in this chapter the quantum- 
mechanical brackets just reminded the angular brackets used for the statistical averaging. Now we see that in this 
particular (but most important) case, the angular brackets of these two types may be indeed equal to each other! 


27 Note also that Eq. (120) may be rewritten in a form similar to Eq. (125): W, =(a| A Ja), where A, is the 
operator (42) of the state’s projection upon the /" eigenstate dj. 


28 For clarity, the noun “‘spin-’4” is used, here and below, to denote the spin degree of freedom of a spin-/ 
particle, independent of its orbital motion. 
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(S.)=((taz +l 


= a,a% (t|8, 


a )$.{a,|t) +22|¥)) 
Taya} (L|/S.|v) + ana; (L|S.|t)+a,a7 (t]5.)) 


S, 

Now there are two equivalent ways (both very simple) to calculate the long brackets in this 
expression. The first one is to represent each of them in the matrix form in the z-basis, in which the bra- 
and ket-vectors of states T and \ are the matrix-rows (1, 0) and (0, 1), or similar matrix-columns — the 
exercise highly recommended to the reader. Another (perhaps more elegant) way is to use the general 
Eq. (59), in the z-basis, together with the spin-’2-specific Eqs. (116a) and (105) to write 


8, = SETH. 8 =a FONT. 8 = SUNT). [eo28) 


For our particular calculation, we may plug the last of these expressions into Eq. (127), and use the 
orthonormality conditions (38): 


(4.127) 


na 


S. 


na 


S. 


(TM=N)=1 (Th)=Q|T)=0. (4.129) 
Both approaches give (of course) the same result: 
(S.)=5(a,24 -a,a/ J. (4.130) 


This particular result might be also obtained using Eq. (120) for the probabilities Wr = arar* 
and W, = ayay*, namely: 


h h * h * h 
(S,)= w+ 4 +W (-4] =a, [+ 4 +ayai|- 4 , (4.131) 


The formal way (127), based on using Eq. (125), has, however, an advantage of being applicable, 
without any change, to finding the observables whose operators are not diagonal in the z-basis, as well. 
In particular, absolutely similar calculations give 


A An A An 


(S,) = a,a,(t S, T+ ana, (V S,. v)+ ar.ar, (4 S, T+ a,a,(t S, \) = s(a.a! +a,at), (4.132) 
(S,,) = aa, (t S ) + aa, (V S, \) +a,0,(V S, T\4+ aya, (t S, \) = iF(a,a} -aa% J, (4.133) 


Let us have a good look at a particular spin state, for example the spin-up state 1. According to 
Eq. (126), in this state at = 1 and a = 0, so that Eqs. (130)-(133) yield: 


(S)=—, (S,)=(5,)=0. (4.134) 


Now let us use the same Eq. (125) to calculate the spin component uncertainties. According to Eqs. 


(105) and (116)-(117), the operator of each spin component squared is equal to (n/2y I , so that the 
general Eq. (1.33) yields 
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(a5, )° =(S?)-(8,)° = (tS? )-(2) - (2) i )-(2) = 0, (4.135a) 
(3, )° =(S7)-(s,)° =(T182|T}-0 = (2) (1 it) = (2), (4.135b) 
(os, F =(s?)-(s,\° =(|82/T)-0 (2) ¢ i|t) -(2). (4.135¢) 


While Eqs. (134) and (135a) are compatible with the classical notion of the angular momentum 
of magnitude fi/2 being directed exactly along the z-axis, this correspondence should not be 
overstretched, because such classical picture cannot explain Eqs. (135b) and (135c). The best (but still 
imprecise!) classical image I can offer is the spin vector S oriented, on average, in the z-direction, but 
still having its x- and y-components strongly “wobbling” (fluctuating) about their zero average values. 


It is straightforward to verify that in the x-polarized and y-polarized states the situation is similar, 
with the corresponding change of axis indices. Thus, in neither of these states all three spin components 
have definite values. Let me show that this is not just an occasional fact, but reflects the perhaps most 
profound property of quantum mechanics, the uncertainty relations. For that, let us consider two 
measurable observables, A and B, of the same quantum system. There are two possibilities here. If the 
operators corresponding to these observables commute, 


[ 4,2]=0, (4.136) 


then all matrix elements of the commutator in any orthogonal basis (in particular, in the basis of 
eigenstates a; of the operator A) have to equal zero: 


(4, ay) 7 (4, 


In the first bracket of the middle expression, let us act by the (Hermitian!) operator A on the bra-vector, 
while in the second one, on the ket-vector. According to Eq. (68), such action turns the operators into 
the corresponding eigenvalues, which may be taken out of the long brackets, so that we get 


AK 


[4,8] AB 


a,)—(a,|BAla,,) =0. (4.137) 


A, (a,|Bla,)4,(a,|B\a,)=[4, ~ 4, Ka,|Bla,)=0. (4.138) 


This means that if all eigenstates of operator A are non-degenerate (i.e. A; # A; if 7 #7’), the 
matrix of operator B has to be diagonal in the basis {a}, i.e., the eigenstate sets of the operators A and 
B coincide. Such pairs of observables (and their operators) that share their eigenstates, are called 
compatible. For example, in the wave mechanics of a particle, its momentum (1.26) and kinetic energy 
(1.27) are compatible, sharing their eigenfunctions (1.29). Now we see that this is not occasional, 
because each Cartesian component of the kinetic energy is proportional to the square of the 
corresponding component of the momentum, and any operator commutes with an arbitrary integer 
power of itself: 


[aa']-] 4 dd] =Addnd- ddd -0. (4.139) 


n 


n n 
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Now, what if operators A and B do not commute? Then the following general uncertainty 
relation is valid: 


6408 > ~|([4,8)) 


; (4.140) 


where all expectation values are for the same but arbitrary state of the system. The proof of Eq. (140) 
may be divided into two steps, the first one proving the so-called Schwartz inequality for any two 
possible states, say a and /3:?° 


(a|a)(6|6)>|a|f)| . (4.141) 


Its proof may be readily achieved by applying the postulate (16) — that the norm of any legitimate state 
of the system cannot be negative — to the state with the following ket-vector: 


pat 


where @ and f are possible, non-null states of the system, so that the denominator in Eq. (142) is not 
equal to zero. For this case, Eq. (16) gives 


Ce rrial Gaaearial e 4.143) 


|2), (4.142) 


Opening the parentheses, we get 


lea) <P) i oly) Fl@) 16) gy « @IBMAI@) 1 9) 9) 5 
(a|@) ‘aay 4 ) (ala) |P) ‘p\)" (B|B)=0. (4.144) 


After the cancellation of one inner product (|) in the numerator and the denominator of the last term, 
it cancels with the 2" (or the 3™) term. What remains is the Schwartz inequality (141). 


Now let us apply this inequality to states 


|z) = d|y) and |B)= Bly), (4.145) 


where, in both relations, y is the same (but otherwise arbitrary) possible state of the system, and the 
deviation operators are defined similarly to the deviations of the observables (see Sec. 1.2): 


A 


4=4-(4), B=B-(B). (4.146) 


An An 


With this substitution, and taking into account again that the observable operators A and B are 
Hermitian, Eq. (141) yields 

A A 2 

(y|A*|y)(y|B?|7) = (4.147) 


(714B\7)) 


Since the state vis arbitrary, we may use Eq. (125) to rewrite this relation as an operator inequality: 


29 This inequality is the quantum-mechanical analog of the usual vector algebra’s result af > \a-B|’. 
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SA OB > (4.148) 


(48) . 


Actually, this is already an uncertainty relation, even “better” (stronger) than its standard form 
(140); moreover, it is more convenient in some cases. To prove Eq. (140), we need a couple of more 
steps. First, let us notice that the operator product participating in Eq. (148) may be recast as 


AB = {4,8} -<6, where Cai 4.8. (4.149) 


Any anticommutator of Hermitian operators, including that in Eq. (149), is a Hermitian operator, and its 
eigenvalues are purely real, so that its expectation value (in any state) is also purely real. On the other 
hand, the commutator part of Eq. (149) is just 


Cai 48] i(4-(4)\B-(B))-i(8 -(B)(4—(4))=i(48-84)=i[ 4,8]. 4.150) 


Second, according to Eqs. (52) and (65), the Hermitian conjugate of any product of the Hermitian 
operators A and B is just the product of these operators swapped. Using the fact, we may write 


st (4, 8])! =-1c48)t +i(B4yt = -184+148 = 1], 8]= ¢, (4.151) 


so that the operator C is also Hermitian, i.e. its eigenvalues are also real, and thus its expectation value 
is purely real as well. As a result, the square of the expectation value of the operator product (149) may 


be represented as 
AL (eave ft aly? 
(48) = 54.8} ry ae ae (4.152) 
2 2 


Since the first term on the right-hand side of this equality cannot be negative, we may write 
ax 1 .\? iT. a 2 
AB) 2>(—C) = 14.8] ' (4.153) 
2 2 
As 1 mon 
a 
(48) > 5|(L4. 8] ) 


For the particular case of operators x and p, (ora similar pair of operators for another Cartesian 


and hence continue Eq. (148) as 


0A OB = 


(4.154) 


thus proving Eq. (140). 


coordinate), we may readily combine Eq. (140) with Eq. (2.14b) and to prove the original Heisenberg’s 
uncertainty relation (2.13). For the spin-/% operators defined by Eq. (116)-(117), it is very simple (and 
highly recommended to the reader) to show that 


lt 6, |= 28a ic. [S,.5,.J=ie,..h8;,.. (4.155) 


relations (140) for all Cartesian components of spin-’2 systenis are senile for exaniple. 
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In particular, as we already know, in the 7 state the right-hand side of this relation equals (/2)” 
> 0, so that neither of the uncertainties OS,, dS, can equal zero. As a reminder, our direct calculation 
earlier in this section has shown that each of these uncertainties is equal to fi/2, i.e. their product is equal 
to the lowest value allowed by the uncertainty relation (156) — just as the Gaussian wave packets (2.16) 
provide the lowest possible value of the product dxdp,, allowed by the Heisenberg relation (2.13). 


4.6. Quantum dynamics: Three pictures 


So far in this chapter, I shied away from the discussion of the system’s dynamics, implying that 
the bra- and ket-vectors were just their “snapshots” at a certain instant t. Now we are sufficiently 
prepared to examine their evolution in time. One of the most beautiful features of quantum mechanics is 
that this evolution may be described using either of three alternative “pictures”, giving exactly the same 
final results for the expectation values of all observables. 


From the standpoint of our wave-mechanics experience, the Schrédinger picture is the most 
natural one. In this picture, the operators corresponding to time-independent observables (e.g., to the 
Hamiltonian function H of an isolated system) are also constant in time, while the bra- and ket-vectors 
evolve in time as 


(a(t)|=(a(t,)|i' (tt), ja) =a(t,t,)|a(ty))- (4.157a) 


Here u(t,t,) is the time-evolution operator, which obeys the following differential equation: 
inci = Hii, (4.157b) 


where H is the Hamiltonian operator of the system — which is always Hermitian: H io A , and t is the 
initial moment of time. (Note that Eqs. (157) remain valid even if the Hamiltonian depends on time 
explicitly.) Differentiating the second of Eqs. (157a) over time ¢, and then using Eq. (157b) twice, we 
can merge these two relations into a single equation, without explicit use of the time-evolution operator: 


in= |a(e) = Hla(t)), (4.158) 


which is frequently more convenient. (However, for some purposes the notion of the time-evolution 
operator, together with Eq. (157b), are useful — as we will see in a minute.) While Eq. (158) is a very 
natural generalization of the wave-mechanical equation (1.25), and is also frequently called the 
Schrodinger equation,*° it still should be considered as a new, more general postulate, which finds its 
final justification (as it is usual in physics) in the agreement of its corollaries with experiment — more 
exactly, in the absence of a single credible contradiction to an experiment. 


Starting the discussion of Eq. (158), let us first consider the case of a time-independent 
Hamiltonian, whose eigenstates a, and eigenvalues E;, obey Eq. (68) for this operator:*! 


H 


a,) = E, 


a,), (4.159) 


30 Moreover, we will be able to derive Eq. (1.25) from Eq. (158) — see below. 
3! | have switched the state index notation from j to n, which was used for numbering stationary states in Chapter 
1, to emphasize the special role played by the stationary states a, in quantum dynamics. 
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and hence are also time-independent. (Similarly to the wavefunctions y,, defined by Eq. (1.60), a, are 
called the stationary states of the system.) Let us use Eqs. (158)-(159) to calculate the law of time 
evolution of the expansion coefficients q@, (i.e. the probability amplitudes) defined by Eq. (118), in a 
stationary state basis, using Eq. (158): 


a(t)) = -ZE a (4.160) 


n~n* 


1 : E. 
lla) = Han 


a(t)) =(a, 


“|a() = (a, 


; d 
a, (t) = ae (a, 


This is the same simple equation as Eq. (1.61), and its integration, with the initial moment fo taken for 0, 
yields a similar result — cf. Eq. (1.62), just with the initial time fo rather than 0: 


a, (t)=a,, (denp|- 2. (4.161) 


In order to illustrate how this result works, let us consider the dynamics of a spin-’2 in a time- 
independent, uniform external magnetic field @. To construct the system’s Hamiltonian, we may apply 
the correspondence principle to the classical expression for the energy of a magnetic moment m in the 
external magnetic field &, 3? 

U=-m-&. (4.162) 


In quantum mechanics, the operator corresponding to the moment m is given by Eq. (115) (suggested by 
W. Pauli), so that the spin-field interaction is described by the so-called Pauli Hamiltonian, which may 
be, due to Eqs. (116)-(117), represented in several equivalent forms: 


(4.163a) 
If the z-axis is aligned with the field’s direction, this expression is reduced to 
H=-yBS,_ = yBo6.. (4.163b) 


According to Eq. (117), in the z-basis of the spin states T and \, the matrix of the operator (163b) is 


where 0 = —-y¥K. (4.164) 


The constant © so defined coincides with the classical frequency of the precession, about the z-axis, of 
an axially-symmetric rigid body (the so-called symmetric top), with an angular momentum §S and the 
magnetic moment m = 7S, induced by the external torque t = mx ¥.33 (For an electron, with its negative 
gyromagnetic ratio % = —g.e/2m,, neglecting the tiny difference of the g.-factor from 2, we get 


Q=-— Z, (4.165) 


m 


e 


so that according to Eq. (3.48), the frequency © coincides with the electron’s cyclotron frequency @.) 


32 See, e.g., EM Eq. (5.100). As a reminder, we have already used this expression for the derivation of Eq. (3). 
33 See, e.g., CM Sec. 4.5, in particular Eq. (4.72), and EM Sec. 5.5, in particular Eq. (5.114) and its discussion. 
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In order to apply the general Eq. (161) to this case, we need to find the eigenstates a, and 
eigenenergies E,, of our Hamiltonian. However, with our (smart :-) choice of the z-axis, the Hamiltonian 
matrix is already diagonal: 


O(l 0 
we = (4.166) 
2 2\0 -1 
meaning that the states T and are the eigenstates of this system, with the eigenenergies, respectively, 
E, +> and £) --=. (4.167) 


Note that their difference, 


VB 


AE =|E, — E,|=A|Q| =A 7B, (4.168) 
corresponds to the classical energy 2| m2 | of flipping a magnetic dipole with the moment’s magnitude 
m = yh/2, oriented along the direction of the field %. Note also that if the product 7% is positive, then QO 
is negative, so that £7 is negative, while E| is positive. This is in the agreement with the classical picture 
of a magnetic dipole m having negative potential energy when it is aligned with the external magnetic 
field B— see Eq. (162) again. 


So, for the time evolution of the probability amplitudes of these states, Eq. (161) immediately 
yields the following expressions: 


a,(t) = a,(0) exp|- on}, a(t) =a, (0) exp + = i; (4.169) 


allowing a ready calculation of the time evolution of the expectation values of any observable. In 
particular, we can calculate the expectation value of S, as a function of time by applying Eq. (130) to the 
(arbitrary) time moment ¢: 


h * * h * * 
(S.)() = |e (‘)a,()-a, (Na, o| = 4G (0)a,(0)-a, (Oa, | =(S,)(0). (4.170) 
Thus the expectation value of the spin component parallel to the applied magnetic field remains constant 
in time, regardless of the initial state of the system. However, this is not true for the components 
perpendicular to the field. For example, Eq. (132), applied to the moment f, gives 


(s.\) = |e: (ai +a, (a*()|- |e: (0)ar* (0)e + a, (0)as(0)e"™ (171) 


Clearly, this expression describes sinusoidal oscillations with frequency (164). The amplitude 
and the phase of these oscillations depend on initial conditions. Indeed, solving Eqs. (132)-(133) for the 
probability amplitude products, we get the following relations: 


hay (hax (t)=(S,\(e)+i(S,\(), hay (tar; (1) = (8, \(e)-i( 8, (a), (4.172) 


valid for any time ¢. Plugging their values for t = 0 into Eq. (171), we get 
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(S,\()= sls, \(0)+ is, (0)}er + : (s,)(0)-#(s, (0) Je ae 
= (S,)(0)cosQr-(S, \(0)sin Qe. 
An absolutely similar calculation using Eq. (133) gives 
(S, )(2) = (8, \(O)cos Qe + (S,)(0)sin Qe. (4.174) 


These formulas show, for example, that if at moment ¢= 0 the spin’s state was T, ive. (S,)(0) = 
(S,)(0) = 0, then the oscillation amplitudes of the both “lateral” components of the spin vanish. On the 
other hand, if the spin was initially in the state —, i.e. had the definite, largest possible value of S,, equal 
to f/2 (in classics, we would say “the spin-’2 was oriented in the x-direction’’), then both expectation 
values (S,.) and (S,) oscillate in time?4 with this amplitude, and with the phase shift 7/2 between them. 


So, the quantum-mechanical results for the expectation values of the Cartesian components of 
spin-’ are indistinguishable from the classical results for the precession, with the frequency Q = —y%, *° 


of a symmetric top with the angular momentum of magnitude S = f/2, about the field’s direction (our 
axis z), under the effect of an external torque t = mx exerted by the field # on the magnetic moment 
m = J. Note, however, that the classical language does not describe the large quantum-mechanical 
uncertainties of the components, obeying Eqs. (156), which are absent in the classical picture — at least 
when it starts from a definite orientation of the angular momentum vector. Also, as we have seen in Sec. 
3.5, the component L, of the angular momentum at the orbital motion of particles is always a multiple of 
h — see, e.g., Eq. (3.139). As a result, the angular momentum of a spin-” particle, with S, = +h/2, cannot 
be explained by any summation of orbital angular moments of its hypothetical components, i.e. by any 
internal rotation of the particle about its axis. 


After this illustration, let us return to the discussion of the general Schrédinger equation (157b) 
and prove the following fascinating fact: it is possible to write the general solution of this operator 
equation. In the easiest case when the Hamiltonian is time-independent, this solution is an exact analog 
of Eq. (161), 


li(t,ty) = Htotodeno|- = ae i = exp Ll =f, I (4.175) 


To start its proof we should, first of all, understand what a function (in this particular case, the exponent) 
of an operator means. In the operator (and matrix) algebra, such nonlinear functions are defined by their 
Taylor expansions; in particular, Eq. (175) means that 


34 This is one more (hopefully, redundant :-) illustration of the difference between the averaging over the 
statistical ensemble and that over time: in Eqs. (170), (173)-(174), and also in quite a few relations below, only 
the former averaging has been performed, so the results are still functions of time. 

35 Note that according to this relation, the gyromagnetic ratio y may be interpreted just as the angular frequency 
of the spin precession per unit magnetic field — hence the name. In particular, for electrons, |y| ~ 1.761x10!! s'T- 
'. for protons, the ratio is much smaller, % = g,e/2m, ~ 2.675x10° s'T", mostly because of their larger mass mp, at 
a g-factor of the same order as for the electron: g, ~ 5.586. For heavier spin-’% particles, e.g., atomic nuclei with 
such spin, the values of y are correspondingly smaller — e.g., y = 8.681x10° s'T” for the °’Fe nucleus. 
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(4.176) 


i)» Lf t) de i UE) oe ‘ 
[-5 e-n)+3{-F) He @=4,) +z{-5) Gay ek 


where H? = HH, H* = HHH, etc. Working with such series of operator products is not as hard as one 
could imagine, due to their regular structure. For example, let us differentiate both sides of Eq. (176) 
over ¢, at constant fo, at the last stage using this equality again — backward: 


2 eG ire : Hae "Hea +d) Aeae-s y+ 
oo h 2 A 0 31 A — 
(4.177) 


Pile te Fx if 7) «s : ice 
=|—-—— |A| 7+ Al(t-t,)+ H’(t-t +...=-—Hu(t,t,), 
*) mt *) (1-1) ( *) _ Hii (tt) 


so that the differential equation (158) is indeed satisfied. On the other hand, Eq. (175) also satisfies the 
initial condition 


fi(ty.ty) =H "(ty,ty) =f (4.178) 
that immediately follows from the definition (157a) of the evolution operator. Thus, Eq. (175) indeed 


gives the (unique) solution for the time evolution operator — in the Schrédinger picture. 


Now let us allow the operator H to bea function of time, but with the condition that its “values” 
(in fact, operators) at different instants commute with each other: 


Ar, Ae")| =0, foranyt',t”. (4.179) 


(An important non-trivial example of such a Hamiltonian is the time-dependent part of the Hamiltonian 
of a particle, due to the effect of a classical, time-dependent, but position-independent force F(A), 

HH, =-F(t)-?. (4.180) 
Indeed, the radius vector’s operator fF does not depend explicitly on time and hence commutes with 
itself, as well as with the c-numbers F(t’) and F(t’’).) In this case, it is sufficient to replace, in all the 
above formulas, the product H(t — t,) with the corresponding integral over time; in particular, Eq. (175) 
is generalized as 


li(t,ty) = oot] tear (4.181) 


This replacement means that the first form of Eq. (176) should be replaced with 


ii(t,t,) =I + y7(- i) [aura 


k 
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Se >al- i) fat, {dty..f ae, Fre, VA (t,)...H(t,). (4.182) 


The proof that Eq. (182) satisfies Eq. (158) is absolutely similar to the one carried out above. 


We may now use Eq. (181) to show that the time-evolution operator remains unitary at any 
moment, even for a time-dependent Hamiltonian, if it satisfies Eq. (179). Indeed, Eq. (181) yields 


fi(t,t,)a | (t,to) = a . J hwoar on 7 } few (4.183) 


Since each of these exponents may be represented with the Taylor series (182), and, thanks to Eq. (179), 
different components of these sums may be swapped at will, the expression (183) may be manipulated 
exactly as the product of c-number exponents, for example rewritten as 


fi(t.t))a (tty) = onl i fee = jaca = exp{0} =/. (4.184) 


This property ensures, in particular, that the system state’s normalization does not depend on time: 


(cx(1)|ex(t)) = (x(t) | "(t)4 (6t0)] ato) = (ato) |eX(t0)). (4.185) 


The most difficult cases for the explicit solution of Eq. (158) are those where Eq. (179) is 
violated.3° It may be proven that in these cases the integral limits in the last form of Eq. (182) should be 
truncated, giving the so-called Dyson series 


* “6 2] -\k t t Cet 7 . . 
fi(t,t,) =1+ > 3 - i) [at,[at,... [dt, HQ )A(t,)..1G). (4.186) 
k=l ™* to to 


to 


Since we would not have time/space to use this relation in our course, I will skip its proof.37 


Let me now return to the general discussion of quantum dynamics to outline its alternative, 
Heisenberg picture. For its introduction, let us recall that according to Eq. (125), in quantum mechanics 
the expectation value of any observable A is a long bracket. Let us explore an even more general form of 
such bracket: 


(a|4| f). (4.187) 


(In some applications, the states a@ and / may be different.) As was discussed above, in the Schrédinger 
picture the bra- and ket-vectors of the states evolve in time, while the operators of observables remain 
time-independent (if they do not explicitly depend on time), so that Eq. (187), applied to a moment f, 
may be represented as 


(ax(t)|As| B(0)) , (4.188) 


where the index “S” is added to emphasize the Schrédinger picture. Let us apply the evolution law 
(157a) to the bra- and ket-vectors in this expression: 


36 We will run into such situations in Chapter 7, but will not need to apply Eq. (186) there. 
37 It may be found, for example, in Chapter 5 of J. Sakurai’s textbook — see References. 
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(ax(t)|4.| B()) = (x(t) a" (t,t) gtd (t.t9)| By): (4.189) 


This equality means that if we form a long bracket with bra- and ket-vectors of the initial-time states, 
together with the following time-dependent Heisenberg operator?® 


Ay (t) = 0 (t,t) ) Agi (tty) =" (t,t) ) Ay (ty) (tty) (4.190) 


all experimentally measurable results will remain the same as in the Schrédinger picture: 


(a(t)|A| BX) = (a(t) |Ay (tt5)| BG). (4.191) 


For full clarity, let us see how does the Heisenberg picture work for the same simple (but very 
important!) problem of the spin-’% precession in a z-oriented magnetic field, described (in the z-basis) by 
the Hamiltonian matrix (164). In that basis, Eq. (157b) for the time-evolution operator becomes 


ne Uy, Uy» _hQi) 0 \u, Up, _ FQ Uy, Uy (4.192) 
CEUs th, 2\0 -lhu,, uy 2. =We. Sts 


We see that in this simple case the differential equations for different matrix elements of the evolution 
operator matrix are decoupled, and readily solvable, using the universal initial conditions (178):°? 


— iO /2 
HE | Ol tee ie ain, (4.193) 
0 elt /2 2 = 2 


Now let us use them in Eq. (190) to calculate the Heisenberg-picture operators of spin 
components — still in the z-basis. Dropping the index “H” for the notation brevity (the Heisenberg- 
picture operators are clearly marked by their dependence on time anyway), we get 


S_(t) =u! (t,0)S, (0)u(t,0) = Fu! (4.0)6,,u(.0) 


Lh girtl2 0 0 1 ge Wetl2 0 
2L 9 = er H2 Jl 0 9 git (4.194) 


hi 0 iQt 
~ 3 ie oe ie 5 (0. cos. ~o, sin Qt)=S,(0) cos Mt -S,, (0)sin Qe. 
e 0 J 


Absolutely similar calculations of the other spin components yield 


38 Note that this strict relation is similar in structure to the first of the symbolic Eqs. (94), with the state bases {v} 
and {u} loosely associated with the time moments, respectively, ¢ and fo. 

39 We could of course use this solution, together with Eq. (157), to obtain all the above results for this system 
within the Schrédinger picture. In our simple case, the use of Eqs. (161) for this purpose was more 
straightforward, but in some cases, e.g., for some time-dependent Hamiltonians, an explicit calculation of the 
time-evolution matrix may be the best (or even only practicable) way to proceed. 
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hl 0 ~jel! | fh 
SO =>| ia, = 5 (6, cosQr+o, sin Qt) =S, 0)cosQr +8, (O)sin Qt, (4.195) 
iew 0 
See a) (4.196) 
=— =—6,= . : 
a phy a) oe 


One practical advantage of these formulas is that they describe the system’s evolution for 
arbitrary initial conditions, thus making the analysis of initial state effects very simple. Indeed, since in 
the Heisenberg picture the expectation values of observables are calculated using Eq. (191) (with G= a), 
with time-independent bra- and ket-vectors, such averaging of Eqs. (194)-(196) immediately returns us 
to Eqs. (170), (173), and (174), which were obtained above in the Schrédinger picture. Moreover, these 
equations for the Heisenberg operators formally coincide with the classical equations of the torque- 
induced precession for c-number variables. (Below we will see that the same exact correspondence is 
valid for the Heisenberg picture of the orbital motion.) 


In order to see that the last fact is by no means a coincidence, let us combine Eqs. (157b) and 
(190) to form an explicit differential equation of the Heisenberg operator’s evolution. For that, let us 
differentiate Eq. (190) over time: 


: OA. eh 
Ay =—— Ajit — Satta ds. (4.197) 
t 


Plugging in the derivatives of the time evolution operator from Eq. (157b) and its Hermitian conjugate, 
and multiplying both sides of the equation by if, we get 


- »- at OAs , 
24 a a aemat “peal aoe. (4.198a) 
dt ot 
If for the Schrédinger-picture’s Hamiltonian the condition similar to Eq. (179) is satisfied, then, 
according to Eqs. (177) or (182), the Hamiltonian commutes with the time evolution operator and its 
Hermitian conjugate, and may be swapped with any of them.*° Hence, we may rewrite Eq. (198a) as 


ih 


: eo A ae A Sn 
nA, = —Hiu TA i+ ina’ ace +u Aan = ina’ OAs + F ' Ast, A]. (4.198b) 
dt ot ot 
Now using the definition (190) again, for both terms on the right-hand side, we may write 
(4.199) 


This is the so-called Heisenberg equation of motion. 


Let us see how this equation looks for the same problem of the spin-’4 precession in a z-oriented, 
time-independent magnetic field, described in the z-basis by the Hamiltonian matrix (164), which does 
not depend on time. In this basis, Eq. (199) for the vector operator of spin reads*! 


> 


40 Due to the same reason, H n= ata gu at ary 5 = H s; this is why the Hamiltonian operator’s index may be 
dropped in Eqs. (198)-(199). 
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of se] =A2 (Se al ‘JI-9(2 a (4.200) 
S51 Sy 2 S,, S59 O. =] S5 0 


Once again, the equations for different matrix elements are decoupled, and their solution is elementary: 
S,,(t)=S,,(0)=const, S,,(t)=S,,(0) =const, 

+101 =101 oan) 

S,,(t)=S$,,(O)e , S,,(t)=$., (Oe . 


According to Eq. (190), the initial values of the Heisenberg-picture matrix elements are just the 
Schrédinger-picture ones, so that using Eq. (117) we may rewrite this solution in either of two forms: 


+ iOt _ jt iOt 1 O 
so-8a| 9 © len) 9 W# +n ) 
2 ge a 0 0 el 


h fi n et 1Qt 
=— oy 7 > where n, =n, + in, ; 
nie 


—n 


Zz 


(4.202) 


The simplicity of the last expression is spectacular. (Remember, it covers any initial conditions 
and all three spatial components of spin!) On the other hand, for some purposes the previous form may 
be more convenient; in particular, its Cartesian components give our earlier results (194)-(196).42 


One of the advantages of the Heisenberg picture is that it provides a more clear link between 
classical and quantum mechanics, found by P. Dirac. Indeed, analytical classical mechanics may be used 
to derive the following equation of time evolution of an arbitrary function A(q;, p;, t) of the generalized 
coordinates g; and momenta p; of the system, and time rt: 


(4.203) 


where His the classical Hamiltonian function of the system, and {..,..} is the so-called Poisson bracket 
defined, for two arbitrary functions A(q;, p;, t) and B(q;, p;, 0), as 


(4,B} >> OA OB OA OB (4.204) 
i 7 OP; 9; 99; OP; . 


Comparing Eq. (203) with Eq. (199), we see that the correspondence between the classical and quantum 
mechanics (in the Heisenberg picture) is provided by the following symbolic relation 


41 Using the commutation relations (155), this equation may be readily generalized to the case of an arbitrary 
magnetic field A(z‘) and an arbitrary state basis — the exercise highly recommended to the reader. 
42 Note that the “values” of the same Heisenberg operator at different moments of time may or may not commute. 


For example, consider a free 1D particle, with the time-independent Hamiltonian H= p /2m. In this case, Eq. 
(199) yields the following equations: ift =[%,H]=ihp/m and inp =| pH ]=0, with simple solutions 
(similar to those for the classical motion): p(t)=const = p(0) and x(t)=x(0)+ p(0)t/m, so that 
[x(0), x(¢)] =[x(0), p(O)]t/m =[X,, Pp, |t/m = iht/m +0, for t #0. 


43 See, e.g., CM Eq. (10.17). The notation there does not use the subscript “P” that is employed in Eqs. (203)- 
(205) to distinguish the classical Poisson bracket (204) from the quantum anticommutator (34). 
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{4, Bl, o [4,8]. (4.205) 
This relation may be used, in particular, for finding appropriate operators for the system’s observables, 
if their form is not immediately evident from the correspondence principle. 


Finally, let us discuss one more alternative picture of quantum dynamics. It is also attributed to 
Dirac, and is called either the “Dirac picture”, or (more frequently) the interaction picture. The last 
name stems from the fact that this picture is very useful for the perturbative (approximate) approaches 
to systems whose Hamiltonians may be partitioned into two parts, 


(4.206) 


where H » 1s the sum of relatively simple Hamiltonians of the component subsystems, while the second 


o=hak 


int ? 


term in Eq. (206) represents their weak interaction. (Note, however, that all relations in the balance of 
this section are exact and not directly based on the interaction weakness.) In this case, it is natural to 
consider, together with the full operator i(t,t, ) of the system’s evolution, which obeys Eq. (157b), a 


similarly defined unitary operator w/, (t,t, ) of evolution of the “unperturbed system” described by the 


Hamiltonian H » alone: 


in a =F ii, (4.207) 
ot 
and also the following interaction evolution operator, 
i, =aln. (4.208) 


la =ajii, (4.209) 
and its Hermitian conjugate, 
at =(a,a,)' =atat, (4.210) 
into the basic Eq. (189): 
(a |4| 8) = (Q(t) ii (t,t) Ast (t,t )| B(to)) 
= (ax(t,)|a} (c to yak (et, Ati, (t,t, Jai, (t,t, ) B(ty)). 


This relation shows that any long bracket (187), i.e. any experimentally verifiable result of 
quantum mechanics, may be expressed as 


(a|4|B) = (a,()|4,0)| 6,0). (4.212) 


if we assume that both the state vectors and the operators depend on time, with the vectors evolving only 
due to the interaction operator t,, 


(4.211) 


(a,Q|=(alt.)| Af (0), [AO) =A, (6,40)| Bo); (4.213) 


Chapter 4 Page 38 of 52 


Classical 
vs. 
quantum 
mechanics 


Interaction 
evolution 
operator 


Interaction 
picture: 
state vectors 


Interaction 
picture: 
operators 


Essential Graduate Physics QM: Quantum Mechanics 


while the operators’ evolution being governed by the unperturbed operator U, : 
A(t) = ii} (tty )Agtig (tet, )- (4.214) 


These relations describe the interaction picture of quantum dynamics. Let me defer an example 
of its use until the perturbative analysis of open quantum systems in Sec. 7.6, and end this section with a 
proof that the interaction evolution operator (208) satisfies the following natural equation, 


nea, = Hii, (4.215) 
where H , 1s the interaction Hamiltonian formed from H i 1 accordance with the same rule (214): 
H(t) = a1 (t,t, )H gti (tasty). (4.216) 


The proof is very straightforward: first using the definition (208), and then Eqs. (157b) and the 
Hermitian conjugate of Eq. (207), we may write 


ih Om G 
t ot 


6 
=-H wli+atA ara 


PL Ol a Oe vet Ae ale =e As 
(ata)=m st jth = —Ayfigd +a} i =—-Ayita+at( 5+ Hy, \i 


(4.217) 
wae LA,at +at A, |e +00, 


int 


Since u i may be represented as an integral of an exponent of H » Over time (similar to Eq. (181) relating 


ai and H ), these operators commute, so that the parentheses in the last form of Eq. (217) vanish. Now 
plugging w from the last form of Eq. (209), we get the equation, 


hei =a H, tiju, = (at 4,2, )a,, (4.218) 
which is clearly equivalent to the combination of Eqs. (215) and (216). 

As Eq. (215) shows, if the energy scale of the interaction Hi, is much smaller than that of the 
background Hamiltonian Ho, the interaction evolution operators u, and u i , and hence the state vectors 


(213) evolve relatively slowly, without fast background oscillations. This is very convenient for the 
perturbative approaches to complex interacting systems, in particular to the “open” quantum systems 
that weakly interact with their environment — see Sec. 7.6. 


4.7. Coordinate and momentum representations 


Now let me show that in application to the orbital motion of a particle, the bra-ket formalism 
naturally reduces to the notions and postulates of wave mechanics, which were discussed in Chapter 1. 
For that, we first have to modify some of the above formulas for the case of a basis with a continuous 
spectrum of eigenvalues. In that case, it is more appropriate to replace discrete indices, such as /, 7’, etc. 
broadly used above, with the corresponding eigenvalue — just as it was done earlier for functions of the 
wave vector — see, e.g., Eqs. (1.88), (2.20), etc. For example, the key Eq. (68), defining the eigenkets 
and eigenvalues of an operator, may be conveniently rewritten in the form 
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Ala,)=Ala,). (4.219) 


More substantially, all sums over such continuous eigenstate sets should be replaced with 
integrals. For example, for a full and orthonormal set of the continuous eigenstates | a4), the closure 


relation (44) should be replaced with 
[a4la,)a,|=f, (4.220) 


where the integral is over the whole interval of possible eigenvalues of the observable 4.44 Applying this 
relation to the ket-vector of an arbitrary state a, we get the following replacement of Eq. (37): 


|a) = ta) =[dA|a,)(a,|@) = [dA (a,|a)|a,). (4.221) 
For the particular case when |) =| a.) , this relation requires that 
(a,|a,)=5(A-A; (4.222) 
this formula replaces the orthonormality condition (38). 


According to Eq. (221), in the continuous case the bracket (a4 lz) still the role of probability 
amplitude, i.e. a complex c-number whose modulus squared determines the state a,’s probability — see 
the last form of Eq. (120). However, for a continuous observable, the probability to find the system 
exactly in a particular state is infinitesimal; instead, we should speak about the probability dW = w(A)dA 
of finding the observable within a small interval dA << A near the value A, with probability density w(A) 
o |(a,yla)l’. The coefficient in this relation may be found by making a similar change from the 
summation to integration in the normalization condition (121): 


[a4(ala,)(a, la) =1. (4.223) 
Since the total probability of the system to be in some state should be equal to /w(A)d4, this means that 
w(A) =(ala,)(a,|a) =|(ala,) (4.224) 


Now let us see how we can calculate the expectation values of continuous observables, i.e. their 
ensemble averages. If we speak about the same observable A whose eigenstates are used as the 
continuous basis (or any compatible observable), everything is simple. Indeed, inserting Eq. (224) into 
the general statistical relation 


(A) = } w(A)AdA, (4.225) 
which is just the obvious continuous version of Eq. (1.37), we get 
(4) = [(a|a,)A(a,|a)dA. (4.226) 
Inserting a delta-function to represent this expression as a formally double integral, 
(A) = [dA[ da'(a|a,)Ad(A- Aa, a), (4.227) 


and using the continuous-spectrum version of Eq. (98), 


44 The generalization to cases when the eigenvalue spectrum consists of both a continuum interval plus some set 
of discrete values, is straightforward, though leads to somewhat cumbersome formulas. 
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(a,|Ala,,) = Ad(A- A’), (4.228) 


(4) = [ddfda'(ala,)(a, ay (a »|a) =(a 


so that Eq. (4.125) remains valid in the continuous-spectrum case without any changes. 


we may write 


A 


A 


A 


Ala), (4.229) 


The situation is a bit more complicated for the expectation values of an operator that does not 
commute with the basis-generating operator, because its matrix in that basis may not be diagonal. We 
will consider (and overcome :-) this technical difficulty very soon, but otherwise we are ready for a 
discussion of the relation between the bra-ket formalism and the wave mechanics. (For the notation 
simplicity I will discuss its 1D version; its generalization to 2D and 3D cases is straightforward.) 


Let us postulate the (intuitively almost evident) existence of a quantum state basis, whose ket- 
vectors will be called |x), corresponding to a certain definite value x of the particle’s coordinate. Writing 
the following trivial identity: 


a San a (4.230) 


and comparing this relation with Eq. (219), we see that they do not contradict each other if we assume 
that x on the left-hand side of this relation is the (Hermitian) operator x of particle’s coordinate, whose 
action on a ket- (or bra-) vector is just its multiplication by the c-number x. (This looks like a proof, but 
is actually a separate, independent postulate, no matter how plausible.) Hence we may consider vectors 
lx) as the eigenstates of the operator x. Let me hope that the reader will excuse me if I do not pursue 
here a strict proof that this set is full and orthogonal,*> so that we may apply to them Eq. (222): 


(x|x") = d(x-x’'). (4.231) 


Using this basis is called the coordinate representation — the term which was already used at the end of 
Sec. 1.1, but without explanation. 


In the basis of the x-states, the inner product (a4|a(t)) becomes (x|a(t)), and Eq. (223) takes the 
following form: 


w(x,t) = (a(t)|x)(x|@(t)) = (x|a(t)) (x|a(0)) (4.232) 


Comparing this formula with the basic postulate (1.22) of wave mechanics, we see that they coincide if 
the wavefunction of a time-dependent state @ is identified with that short bracket:*° 


WY, (x,1) = (x|a(0)). (4.233) 


This key formula provides the desired connection between the bra-ket formalism and the wave 
mechanics, and should not be too surprising for the (thoughtful :-) reader. Indeed, Eq. (45) shows that 
any inner product of two state vectors describing two states is a measure of their coincidence — just as 
the scalar product of two geometric vectors is; the orthonormality condition (38) is a particular 
manifestation of this fact. In this language, the particular value (233) of a wavefunction VY, at some 


45Such proof is rather involved mathematically, but physically this fact should be evident. 

46 T do not quite like expressions like (x|’) used in some papers and even textbooks. Of course, one is free to 
replace @ with any other letter (Y including) to denote a quantum state, but then it is better not to use the same 
letter to denote the wavefunction, i.e. an inner product of two state vectors, to avoid confusion. 
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point x and moment ¢ characterizes “how much of a particular coordinate x” does the state a contain at 
time ¢. (Of course, this informal language is too crude to reflect the fact that ‘V(x, f) is a complex 
function, which has not only a modulus but also an argument — the quantum-mechanical phase.) 


Now let us rewrite the most important formulas of the bra-ket formalism in the wave mechanics 
notation. Inner-multiplying both parts of Eq. (219) by the ket-vector (x|, and then inserting into the left- 
hand side of that relation the identity operator in the form (220) for coordinate x’, we get 


[ax'(x|A x)" aj) =Aa lay); (4.234) 
i.e., using the wavefunction’s definition (233), 
| dx'(x|A] x") ,(x') = AV, (x), (4.235) 


where, for the notation brevity, the time dependence of the wavefunction is just implied (with the capital 
Y serving as a reminder of this fact), and will be restored when needed. 


For a general operator, we would have to stop here, because if it does not commute with the 
coordinate operator, its matrix in the x-basis is not diagonal, and the integral on the left-hand side of Eq. 
(235) cannot be worked out explicitly. However, virtually all quantum-mechanical operators discussed 
in this course*’ are (space-) local: they depend on only one spatial coordinate, say x. For such operators, 
the left-hand side of Eq. (235) may be further transformed as 


} (x|A]x'VP (xx! = | (x|x"\AP(x)dx' = A } 5(x —x')P(x')dx’ = AV (x). (4.236) 
The first step in this transformation may appear as elementary as the last two, with the ket-vector |x’) 
swapped with the operator depending only on x; however, due to the delta-functional character of the 
bracket (231), this step is, in fact, an additional postulate, so that the second equality in Eq. (236) 
essentially defines the coordinate representation of the local operator, whose explicit form still needs to 
be determined. 


Let us consider, for example, the 1D version of the Hamiltonian (1.41), 


=: 4U(8), (4.237) 
2m 


which was the basis of all our discussions in Chapter 2. Its potential-energy part U (which may be time- 
dependent as well) commutes with the operator x, i.e. its matrix in the x-basis has to be diagonal. For 
such an operator, the transformation (236) is indeed trivial, and its coordinate representation is given 
merely by the c-number function U(x). 


The situation the momentum operator f, (and hence the kinetic energy p:/2m), not 


commuting with x, is less evident. Let me show that its coordinate representation is given by the 1D 
version of Eq. (1.26), if we postulate that the commutation relation (2.14), 


[t,p]=ihl, ic. %,-p.R=ihl, (4.238) 


47 The only substantial exception is the statistical operator w (x, x’), to be discussed in Chapter 7. 
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is valid in any representation.*8 For that, let us consider the following matrix element, (x|Xp,, — px 


xy, 


On one hand, we may use Eq. (238), and then Eq. (231), to write 
x = (x|inl 


(x|Xp, — px x) = ih(x| x’) = ihd(x-x’'). (4.239) 


On the other hand, since &|x') =x" 


x) and (x |X = (x 


x, we may represent the same matrix element as 


(x|Xp, — px x’) = (x|xp, — p,x' x") =(x —x'\(x|p,| x) (4.240) 
Comparing Eqs. (239) and (240), we get 
(x|p, |x") = m2) (4.241) 
X—X 


As it follows from the definition of the delta function,*? all expressions involving it acquire final sense 
only at their integration, in our current case, at that described by Eq. (236). Plugging Eq. (241) into the 
left-hand side of that relation, we get 


Je 


Since the right-hand-part integral is contributed only by an infinitesimal vicinity of the point x’ = x, we 
may calculate it by expanding the continuous wavefunction ‘¥(x’) into the Taylor series in small (x’— x), 
and keeping only two leading terms of the series, so that Eq. (242) is reduced to 


f(x x’) OP (x) 


Ox" 
Since the delta function may be always understood as an even function of its argument, in our case of (x 
— x’), the first term on the right-hand side is proportional to an integral of an odd function in symmetric 
limits and is equal to zero, and we get°° 


[Ola 


Comparing this expression with the right-hand side of Eq. (236), we see that in the coordinate 
representation we indeed get the 1D version of Eq. (1.26), which was used so much in Chapter 2,°! 
p= se (4.245) 
ox 


B,|x")Y(x")de' = in| E> ey cen cet, (4.242) 
Poa 


p,|x')P(x)dx' = i) Wo) J lx Be Da ec ek a (4.243) 
X-xX 


x'\Y (x")dx' = -ih aa (4.244) 
Ox 


48 Another possible approach to the wave mechanics axiomatics is to derive Eg. (238) by postulating the form, 
7, =exp{—ip,X/h}, of the operator that shifts any wavefunction by distance X along the axis x. In my 
approach, this expression will be derived when we need it (in Sec. 5.5), while Eq. (238) is postulated. 

49 Tf necessary, please revisit MA Sec. 14. 

50 One more useful expression of this type, which may be proved similarly, is (6/0x) Xx — x’) = (x —x’)d/Ox’. 

5! This means, in particular, that in the sense of Eq. (236), the operator of differentiation is local, despite the fact 
that its action on a function fmay be interpreted as the limit of the fraction Af/Ax, involving two points. (In some 
axiomatic systems, local operators are defined as arbitrary polynomials of functions and their derivatives.) 
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It is straightforward to show (and is virtually evident) that the coordinate representation of any 
operator function f(p,) is 
(-i <, (4.246) 
ox 
In particular, this pertains to the kinetic energy operator in Eq. (237), so the coordinate representation of 
this Hamiltonian also takes the very familiar form: 
~ 1 a) no? 
H =—|-ih—| +U(s,t) =-——, +UG,1). 4.247 
+( 2) eH) 2m Ox? oH) ( ) 


Now returning to the discussion of the general Eq. (235), and comparing its last form with that of 
Eq. (236), we see that for a local operator in the coordinate representation, the eigenproblem (219) takes 


the form 
AY, (x) = AY, (x), (4.248) 


even if the operator A does not commute with the operator x. The most important case of this 
coordinate-representation form of the eigenproblem (68) is the familiar Eq. (1.60) for the eigenvalues E,, 
of the energy of a system with a time-independent Hamiltonian. 


The operator locality also simplifies the expression for its expectation value. Indeed, plugging 
the closure relation in the form (231) into the general Eq. (125) twice (written in the first case for x and 
in the second case for x’), we get 


(A) = [ de[ de'(x()|x)(x]4]x")(x'|a(O) = [ae f de ¥2 (x.)(x|4]x")¥ (2,0). (4.249) 
Now, Eq. (236) reduces this result to just 
(4) = [de[ de®) (x, AY, (x,1)(x-x')= [PLO AY, (x, de. (4.250) 


i.e. to Eq. (1.23), which had to be postulated in Chapter 1. 


Finally, let us discuss the time evolution of the wavefunction, in the Schrédinger picture. For 

that, we may use Eq. (233) to calculate the (partial) time derivative of the wavefunction of some state a: 
ov 
ih— 
Ot 
Since the coordinate operator x does not depend on time explicitly, its eigenstates x are stationary, and 
we can swap the time derivative and the time-independent bra-vector (x|. Now using the Schrédinger- 


picture equation (158), and then inserting the identity operator in the continuous form (220) of the 
closure relation, written for the coordinate eigenstates, 


| adx'|x")(x" 


we may continue to develop the right-hand side of Eq. (251) as 


= ih < (x a(t)). (4.251) 


=f, (4.252) 


at))= fax’ (x|H 


(x|in| a(t)) = (x|H|a()) = | de'(x| A] x')(x" xP’), (4.253) 
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If the Hamiltonian operator is local, we may apply Eq. (236) to the last expression, to get the familiar 
form (1.28) of the Schrédinger equation: 
YW A 
ih ae = i, (4.254) 


in which the coordinate representation of the operator His implied. 


So, for the local operators that obey Eq. (236), we have been able to derive all the basic notions 
and postulates of the wave mechanics from the bra-ket formalism. Moreover, the formalism has allowed 
us to get the very useful equation (248) for an arbitrary local operator, which will be repeatedly used 
below. (In the first three chapters of this course, we have only used its particular case (1.60) for the 
Hamiltonian operator.) 


Now let me deliver on my promise to develop a more balanced view at the monochromatic de 
Broglie waves (1), which would be more respectful to the evident r <> p symmetry of the coordinate 
and momentum. Let us do this for the 1D case when the wave may be represented as 


WY (x)=a, exp a for all—w <x<+0, (4.255) 


(For the sake of brevity, from this point to the end of the section, I am dropping the index x in the 
notation of the momentum — just as it was done in Chapter 2.) Let us have a good look at this function. 
Since it satisfies Eq. (248) for the 1D momentum operator (245), 


PY, =PV >> (4.256) 


y, is an eigenfunction of that operator. But this means that we can also write Eq. (219) for the 
corresponding ket-vector: 


P|p)= |p). (4.257) 
and according to Eq. (233), the wavefunction (255) may be represented as 
y (x) =(x|p). (4.258) 


This expression is quite remarkable in its x <> p symmetry — which may be pursued further on. 
Before doing that, however, we have to discuss the normalization of such wavefunctions. Indeed, in this 
case, the probability density w(x) (18) is constant, so that its integral 


Jooodr = lv, (xy, (x)dx (4.259) 


diverges if a, # 0. Earlier in the course, we discussed two ways to avoid this divergence. One is to use a 
very large but finite integration volume — see Eq. (1.31). Another way is to work with wave packets of 
the type (2.20), possibly of a very large length and hence a very narrow spread of the momentum values. 
Then the integral (259) may be required to equal | without any conceptual problem. 


However, both these methods, convenient for the solution of many particular problems, violate 
the x <> p symmetry and hence are inconvenient for our current conceptual discussion. Instead, let us 
continue to identify the eigenvectors (p| and |p) of the momentum with the bra- and ket-vectors (a4 and 
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laa) of the general theory described at the beginning of this section. Then the normalization condition 
(222) becomes 


(p|p') =5(p-p’). (4.260) 


Inserting the identity operator in the form (252), with the integration variable x’ replaced by x, into the 
left-hand side of this equation, and using Eq. (258), we can translate this normalization rule to the 
wavefunction language: 


| dx(p|x)(x|p') = [dew Ow (x) = 8(p- p’). (4.261) 


For the wavefunction (255), this requirement turns into the following condition: 


ad, fexoti PoP a la | 2ahd(p—p') = 6(p- p'), (4.262) 


—00 


so that, finally, a, = e'(2h)'”, where ¢ is an arbitrary (real) phase, and Eq. (255) becomes®2 


v(x) = (x|p) = aaron e ‘ 6)} | (4.263) 


Now let us represent an arbitrary wavefunction y(x) as a wave packet of the type (2.20), based 
on the wavefunctions (263), taking @ = 0 for the notation brevity, because the phase may be 
incorporated into the (generally, complex) mere function gp): 


(x)= | olp) exp Eli. (4.264) 


a h 


From the mathematical point of view, this is just a 1D Fourier spatial transform, and its reciprocal is 


Q(p) = 7 —, [v@) exp| i Pa. (4.265) 


oa) 
These expressions are completely symmetric, and represent the same wave packet; this is why the 
functions y(x) and gp) are frequently called the reciprocal representations of a quantum state of the 
particle: respectively, its coordinate (x-) and momentum (p-) representations. Using Eq. (258), and Eq. 
(263) with ¢ = 0, they may be recast into simpler forms, 


w(x) =|o(p)(x|p)dp, ep) = | v)(p| x)aex, (4.266) 
in which the inner products satisfy the basic postulate (14) of the bra-ket formalism: 
1 px * 
(p|x) = (any? exp i ; | (x|p) ; (4.267) 


52 Repeating such calculation for each Cartesian component of a plane monochromatic wave of arbitrary 
dimensionality d, we get wp = (2h) “’exp{i(p-r/h+ @)}. 
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Next, we already know that in the x-representation, i.e. in the usual wave mechanics, the 
coordinate operator x is reduced to the multiplication by x, and the momentum operator is proportional 


to the partial derivative over the coordinate: 
* 


representation: 
operators (4.268) 
It is natural to guess that in the p-representation, the expressions for operators would be reciprocal 
p- 0 
representation: (4.269) 


operators 


with the only difference of one sign, which is due to the opposite signs of the Fourier exponents in Eqs. 
(264) and (265). The proof of Eqs. (269) is straightforward; for example, acting by the momentum 
operator on the arbitrary wavefunction (264), we get 


‘ wae, O dD. 
YC) =-ih— ya) = A) aye — | ot) -n Sexp|2*| p= any —~ | pokp)exp4i Pha, (4.270) 


and similarly for the operator x acting on the function gp). Comparing the final form of Eq. (270) with 
the initial Eq. (264), we see that the action of the operators (268) on the wavefunction y (i.e. the state’s 
x-representation) gives the same results as the action of the operators (269) on the function 9@ (i.e. its p- 
representation). 


It is illuminating to have one more, different look at this coordinate-momentum duality. For that, 
notice that according to Eqs. (82)-(84), we may consider the bracket (x|p) as an element of the (infinite- 
size) matrix U,, of the unitary transform from the x-basis to the p-basis. Let us use this fact to derive the 
general operator transform rule that would be a continuous version of Eq. (92). Say, we want to 
calculate the general matrix element of some operator, known in the x-representation, in the p- 
representation: 


(p|4|p'). (4.271) 


Inserting two identity operators (252), written for x and x’, into this bracket, and then using Eq. (258) 
and its complex conjugate, and also Eq. (236) (again, valid only for space-local operators!), we get 


(p|A| p') = [dxf de'(p|x)(x x'\(x"| p') 7 [def de'y, ()(x x')W (X') 
: px 1 -PX| 4 .px hele) 
= Jas| dx' exp 12 Pr aoe x')A jexp| - = sql exp| -i2| exp = 
As a sanity check, for the momentum operator itself, this relation yields: 


. ie x re .(p'— p)x ack 
(p|Alp’) = 51 Javern| ell Lor Jexo[ ie \- exp PPI ay = pop — p). (4.273) 


Due to Eq. (257), this result is equivalent to the second of Eqs. (269). 


From a thoughtful reader, I anticipate the following natural question: why is the momentum 
representation used much less frequently than the coordinate representation — i.e. wave mechanics? The 
answer is purely practical: besides the important special case of the 1D harmonic oscillator (to be 
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revisited in Sec. 5.4), in most systems the orbital-motion Hamiltonian (237) is not x <> p symmetric, 
with the potential energy U(r) typically being a more complex function than the kinetic energy p’/2m. 
Because of that, it is easier to analyze such systems treating such potential energy operator just a c- 
number multiplier, as it is in the coordinate representation — as it was done in Chapters 1-3. 


The most significant exception from this practice is the motion in a periodic potential in presence 
of a coordinate-independent external force F(A). As was discussed in Secs. 2.7 and 3.4, in such periodic 
systems the eigenenergies E,,(q), playing the role of the effective kinetic energy of the particle, may be 
rather involved functions of its quasimomentum /iq, while its effective potential energy Ucr = —F(0)-r, 
due to the additional force F(A), is a very simple function of coordinates. This is why detailed analyses of 
the quantum effects briefly discussed in Sec. 2.8 (the Bloch oscillations, etc.) and also such statistical 
phenomena as drift, diffusion, etc.,°3 in solid-state theory are typically based on the momentum (or 
rather quasimomentum) representation. 


4.8. Exercise problems 


4.1. Prove that if A and B are linear operators, and C is a c-number, then: 
(i) (4 Jte A; (i) (ca)t=c* a"; (iii) (48)'= Bt aT; 
(iv) the operators AA‘ and A’ A are Hermitian. 


4.2. Prove that for any linear operators A,B, C , and D, 
[48,Cd|= 4{8,c\b- Cla, bls (4, c}bB- C14, BB. 


4.3. Calculate all possible binary products ojo; (for j, 7’ = x, y, z) of the Pauli matrices, defined 
by Eqs. (105), and their commutators and anticommutators (defined similarly to those of the 
corresponding operators). Summarize the results, using the Kronecker delta and Levi-Civita permutation 
symbols.*4 


4.4. Calculate the following expressions, 

(i) (e-o)", and then 

(ii) (bI + e-o)", 
for the scalar product e-o of the Pauli matrix vector o = n,o, + n,o, + n,o, by an arbitrary c-number 
geometric vector ¢, where n = 0 is an integer, and b is an arbitrary scalar c-number. 


Hint: For task (11), you may like to use the binomial theorem,°*> and then transform the result in a 
way enabling you to use the same theorem backward. 


53 In this series, a brief discussion of these effects may be found in SM Chapter 6. 
54 See, e.g., MA Eggs. (13.1) and (13.2). 
55 See, e.g. MA Eq. (2.9). 
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4.5. Use the solution of the previous problem to derive Eqs. (2.191) for the transparency Y of a 
system of N similar, equidistant, delta-functional potential barriers. 


4.6. Use the solution of Problem 4(1) to spell out the following matrix: exp {i@ n-o}, where o is 
the 3D vector (117) of the Pauli matrices, n is a c-number geometric vector of unit length, and @is a c- 
number scalar. 


4.7. Use the solution of Problem 4(ii) to calculate exp{A}, where A is an arbitrary 2x2 matrix. 


4.8. Express all elements of the matrix B = exp{A} explicitly via those of the 2x2 matrix A. 
Spell out your result for the following matrices: 


a-(' | a-(? a 
aa ig ig 


4.9. Prove that for arbitrary square matrices A and B, 
Tr (AB) = Tr(BA). 


with real a and ¢. 


Is each diagonal element (AB), necessarily equal to (BA),;? 


4.10. Calculate the trace of the following 2x2 matrix: 
A= (a-o)(b-o)(c-s), 
where o is the Pauli matrix vector, while a, b, and ¢ are arbitrary c-number vectors. 


4.11. Prove that the matrix trace of an arbitrary operator does not change at its arbitrary unitary 
transformation. 


4.12. Prove that for any two full and orthonormal bases {uw} and {v} of the same Hilbert space, 


4.13. Is the 1D scattering matrix S, defined by Eq. (2.124), unitary? What about the 1D transfer 
matrix T defined by Eq. (2.125)? 


4.14. Calculate the trace of the following matrix: 
exp{ia : o}exp{ib : o}, 
where o is the Pauli matrix vector, while a and b are c-number geometric vectors. 


4.15. Prove the following vector-operator identity: 


(o-r\(o-p)=If-p+io-(Fxp), 
where o is the Pauli matrix vector, and I is the 2x2 identity matrix. 
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Hint: Take into account that the vector operators r and p are defined in the orbital-motion 


Hilbert space, different from that of the Pauli vector-matrix o, and hence commute with it — even though 
they do not commute with each other. 


4.16. Let A; be eigenvalues of some operator A. Express the following two sums, 
r=) 4, y=) 4, 
j P 


via the matrix elements A, of this operator in an arbitrary basis. 


4.17. Calculate (o,) of a spin—2 in the quantum state with the following ket-vector: 
|) = const x|t)+|¥) +|)+|2)), 
where (T, V) and (—, <) are the eigenstates of the Pauli matrices o, and 6,, respectively. 
Hint: Double-check whether your solution is general. 
4.18. A spin-% is fully polarized in the positive z-direction. Calculate the probabilities of the 


alternative outcomes of a perfect Stern-Gerlach experiment with the magnetic field oriented in an 
arbitrary different direction, performed on a particle in this spin state. 


4.19. In a certain basis, the Hamiltonian of a two-level system is described by the matrix 


E, 0 . 
H= , with E, #£,, 
0 E, 


while the operator of some observable A of this system, by the matrix 


“C) 


For the system’s state with the energy definitely equal to £), find the possible results of measurements 
of the observable A and the probabilities of the corresponding measurement outcomes. 


4.20. Certain states u;2,3 form an orthonormal basis of a system with the following Hamiltonian 
H= 5 (lu, uy | + | (uy | + |i (ue, |) + h.c., 


where dis a real constant, and h.c. means the Hermitian conjugate of the previous expression. Calculate 
its stationary states and energy levels. Can you relate this system to any other(s) discussed earlier in the 
course? 


4.21. Guided by Eq. (2.203), and by the solutions of Problems 3.11 and 4.20, suggest a 
Hamiltonian describing particle’s dynamics in an infinite 1D chain of similar potential wells in the tight- 
binding approximation, in the bra-ket formalism. Verify that its eigenstates and eigenvalues correspond 


to those discussed in Sec. 2.7. 
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4.22. Calculate eigenvectors and eigenvalues of the following matrices: 


0 


— 


0 
Or 4A. 0 
0 
A=|l 0 1], B= 
0 
Oe 30 
1 


oO Coo Fr 


1 
0 
0 


So 


4.23. A certain state vy is an eigenstate of each of two operators, A and B. What can be said 
about the corresponding eigenvalues a and 5d, if the operators anticommute? 


4.24. Derive the differential equation for the time evolution of the expectation value of an 
observable, using both the Schrédinger picture and the Heisenberg picture of quantum dynamics. 


4.25. At t= 0, a spin-’2 whose interaction with an external field is described by the Hamiltonian 


An 


H =¢-6=C,6,+C,6,+0€,6, 


(where c,,,- are real c-number constants, and o, ,. are the Pauli operators), was in the state T, one of the 


XYZ 


two eigenstates of o, . In the Schrédinger picture, calculate the time evolution of: 


(i) the ket-vector |) of the spin (in any time-independent basis you like), 
(ii) the probabilities to find the spin in the states T and \, and 
(iii) the expectation values of all three Cartesian components (S,, etc.) of the spin vector. 


Analyze and interpret the results for the particular case c, = c, = 0. 


Hint: Think about the best basis to use for the solution. 


4.26. For the same system as in the previous problem, use the Heisenberg picture to calculate the 
time evolution of: 


(i) all three Cartesian components of the spin operator Ss. (t), and 
(ii) the expectation values of the spin components. 


Compare the latter results with those of the previous problem. 


4.27. For the same system as in the two last problems, calculate matrix elements of the operator 
6, in the basis of the stationary states of the system. 


4.28. In the Schrodinger picture of quantum dynamics, certain three operators satisfy the 
following commutation relation: 


[4,2]=¢. 
What is their relation in the Heisenberg picture, at a certain time instant f? 


4.29. Prove the Bloch theorem given by either Eq. (3.107) or Eq. (3.108). 
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Hint: Consider the translation operator ee defined by the following result of its action on an 
arbitrary function f(r): 


7, f(t) = f(r +R), 


for the case when R is an arbitrary vector of the Bravais lattice (3.106). In particular, analyze the 
commutation properties of this operator, and apply them to an eigenfunction y(r) of the stationary 
Schrédinger equation for a particle moving in the 3D periodic potential described by Eq. (3.105). 


4.30. A constant force F is applied to an (otherwise free) 1D particle of mass m. Calculate the 
stationary wavefunctions of the particle in: 


(1) the coordinate representation, and 
(11) the momentum representation. 


Discuss the relation between the results. 


4.31. Use the momentum representation to re-solve the problem discussed at the beginning of 
Sec. 2.6, i.e. calculate the eigenenergy of a 1D particle of mass m, localized in a very short potential 
well of “weight” w. 


4.32. The momentum representation of a certain operator of 1D orbital motion is p’. Find its 
coordinate representation. 


4.33.” For a particle moving in a 3D periodic potential, develop the bra-ket formalism for the q- 
representation, in which a complex amplitude similar to a, in Eq. (2.234) (but generalized to 3D and all 
energy bands) plays the role of the wavefunction. In particular, calculate the operators r and v in this 
representation, and use the result to prove Eq. (2.237) for the 1D case in the low-field limit. 


4.34. A uniform, time-independent magnetic field @ is induced in one semi- 
space, while the other semi-space is field-free, with a sharp, plane boundary 
between these two regions. A monochromatic beam of non-relativistic, electrically- 
neutral spin-’2 particles with a gyromagnetic ratio v¥ 0,°° in a certain spin state and 
with a kinetic energy E, is incident on this boundary, from the field-free side, under 
angle @ — see figure on the right. Calculate the coefficient of particle reflection 
from the boundary. 


56 The fact that y may be different from zero even for electrically-neutral particles, such as neutrons, is explained 
by the Standard Model of the elementary particles, in which a neutron “consists” (in a broad sense of the word) of 
three electrically-charged quarks with zero net charge. 
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Chapter 5. Some Exactly Solvable Problems 
The objective of this chapter is to describe several relatively simple but very important applications of 


the bra-ket formalism, including a few core problems of wave mechanics we have already started to 
discuss in Chapters 2 and 3. 


5.1. Two-level systems 


The discussion of the bra-ket formalism in the previous chapter was peppered with numerous 
illustrations of its main concepts on the example of “spins-’4” — systems with the smallest non-trivial 
(two-dimensional) Hilbert space, in which the bra- and ket-vectors of an arbitrary quantum state a@ may 
be represented as a linear superposition of just two basis vectors, for example 


ja)=a,|T)+a,|¥), (5.1) 


where the states T and \ were defined as the eigenstates of the Pauli matrix o, — see Eq. (4.105). For the 
genuine spin-’/4 particles, such as electrons, placed in a z-oriented time-independent magnetic field, 
these states are the stationary “spin-up” and “spin-down” stationary states of the Pauli Hamiltonian 
(4.163), with the corresponding two energy levels (4.167). However, an approximate but reasonable 
quantum description of some other important systems may also be given in such Hilbert space. 


For example, as was discussed in Sec. 2.6, two weakly coupled space-localized orbital states of a 
spin-free particle are sufficient for an approximate description of its quantum oscillations between two 
potential wells. A similar coupling of two traveling waves explains the energy band splitting in the 
weak-potential approximation of the band theory — Sec. 2.7. As will be shown in the next chapter, in 
systems with time-independent Hamiltonians, such situation almost unavoidably appears each time 
when two energy levels are much closer to each other than to other levels. Moreover, as will be shown 
in Sec. 6.5, a similar truncated description is adequate even in cases when two levels E,, and E,, of an 
unperturbed system are not close to each other, but the corresponding states become coupled by an 
applied ac field of a frequency @ very close to the difference (E,, — E, )/h. Such two-level systems 
(alternatively called “spin-'-like” systems) are nowadays the focus of additional attention in the view of 
prospects of their use for quantum information processing and encryption.! This is why let me spend a 
bit more time reviewing the main properties of an arbitrary two-level system. 


First, the most general form of the Hamiltonian of a two-level system is represented, in an 
arbitrary basis, by a 2x2 matrix 
H,, H 
H-[ - "| (5.2) 


According to the discussion in Secs. 4.3-4.5, since the Hamiltonian operator has to be Hermitian, the 
diagonal elements of the matrix H have to be real, and its off-diagonal elements be complex conjugates 


! Tn the last context, to be discussed in Sec. 8.5, the two-level systems are usually called gubits. 
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of each other: H2; = Hj2*. As a result, we may not only represent H as a linear combination (4.106) of 
the identity matrix and the Pauli matrices but also reduce it to a more specific form: 


BiG.  C.—1e, b+e, Cc. ez (5.3) 
= ; Cc, =c,tic,, 
oe _ 


Corie, B=, c. 2 


H=bivee=| 


where the scalar b and the Cartesian components of the vector ¢ are real c-number coefficients: 


_ Ay, +H Cc _ Ay +H, =ReH Cc _ Ay ~Ay =ImH c _ A, - Ay 


b 2 ; 7 D) 21? y oF 21° Zz 9) 


(5.4) 
If such Hamiltonian does not depend on time, the corresponding characteristic equation (4.103) for the 
system’s energy levels F:, 

b+e,-E é. 


é b=-¢,-— FE 


4 


=0, (5.5) 


is a simple quadratic equation, with the following solutions: 


2 1/2 
A, +H H,,-A 
=btc=bt+(c C. +e)" =b+(c? +c) 42)" ~ Hast (Hut) so (5.6) 


The parameter b = (Hi, + H22)/2 evidently gives the average energy E of the system, which 
does not contribute to the level splitting 


1/2 
AE =E,-E_=2c=2(c2+c2 +c?) = (Hr, ~H,,) + 4H, | (5.7) 
So, the splitting is a hyperbolic function of the coefficient c, = (A, — H22)/2. A plot of this function is 
the famous level-anticrossing diagram (Fig. 1), which has already been discussed in Sec. 2.7 in the 
particular context of the weak-potential limit of the 1D band theory. 


Fig. 5.1. The level-anticrossing diagram 
for an arbitrary two-level system. 


The physics of the diagram becomes especially clear if the two states of the basis used to spell 
out the matrix (2), may be interpreted as the stationary states of two potentially independent subsystems, 
with the energies, respectively, Hi; and H22. (For example, in the case of two weakly coupled potential 
wells discussed in Sec. 2.6, these are the ground-state energies of two distant wells.) Then the off- 
diagonal elements c_ = Az and cy = Ay, = Ay describe the subsystem coupling, and the level 
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anticrossing diagram shows how do the eigenenergies of the coupled system depend (at fixed coupling) 
on the difference of the subsystem energies. As was already discussed in Sec. 2.7, the most striking 
feature of the diagram is that any non-zero coupling |c.| = (c, + eye? changes the topology of the 


eigenstate energies, creating a gap of the width AE. 


As it follows from our discussions of particular two-level systems in Secs. 2.6 and 4.6, their 
dynamics also has a general feature — the quantum oscillations. Namely, if we put any two-level system 
into any initial state different from one of its eigenstates +, and then let it evolve on its own, the 
probability of its finding the system in any of the “partial” states exhibits oscillations with the frequency 


ee ae ye (5.8) 


lowest at the exact subsystem symmetry (c, = 0, i.e. Hi; = H2), when it is proportional to the coupling 
strength: Omin = 2|ci\/h = 2|Ai2\/h = 2|Ar,\/h. 


In the case discussed in Sec. 2.6, these are the oscillations of a particle between the two coupled 
potential wells (or rather of the probabilities to find it in either well) — see, e.g., Eqs. (2.181). On the 
other hand, for a spin-’ particle in an external magnetic field, these oscillations take the form of spin 
precession in the plane normal to the field, with periodic oscillations of its Cartesian components (or 
rather their expectation values) — see, e.g., Eqs. (4.173)-(4.174). Some other examples of the quantum 
oscillations in two-level systems may be rather unexpected; for example, the ammonium molecule NH3 
(Fig. 2) has two symmetric states that differ by the inversion of the nitrogen atom relative to the plane of 
the three hydrogen atoms, which are weakly coupled due to quantum-mechanical tunneling of the 
nitrogen atom through the plane of the hydrogen atoms.” Since for this particular molecule, in the 
absence of external fields, the level splitting AF corresponds to an experimentally convenient frequency 
Q/272 ~ 24 GHz, it played an important historic role at the initial development of the atomic frequency 
standards and microwave quantum generators (masers) in the early 1950s,3 which paved the way toward 
laser technology. 


Fig. 5.2. An ammonia molecule and its inversion. 


Now let us now discuss a very convenient geometric representation of an arbitrary state a of 
(any!) two-level system. As Eq. (1) shows, such state is completely described by two complex 


2 Since the hydrogen atoms are much lighter, it would be fairer to speak about the tunneling of their triangle 
around the (nearly immobile) nitrogen atom. 
3 In particular, these molecules were used in the demonstration of the first maser by C. Townes’ group in 1954. 
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coefficients (c-numbers) — say, at and a. If the vectors of the basis states T and ¥ are normalized, then 
these coefficients must obey the following restriction: 


oe oe ok 2 2 
W, =(ala)= ((t\a% + (Va, \e,|t) +a,|))= A.A, +A) ay = J@,| +|@,| =1. (5.9) 
This requirement is automatically satisfied if we take the moduli of at and a equal to the sine and 
cosine of the same (real) angle. Thus we may write, for example, 


ay = cos Se”, ay =sin 279), (5.10) 


Moreover, according to the general Eq. (4.125), if we deal with just one system,* the common phase 


factor exp {iy} drops out of the calculation of any expectation value, so that we may take y= 0, and Eq. 
(10) is reduced to 


(5.11) 


The reason why the argument of these sine and cosine functions is usually taken in the form 0/2, 
becomes clear from Fig. 3a: Eq. (11) conveniently maps each state @ of a two-level system on a certain 
representation point on a unit-radius Bloch sphere,> with the polar angle @ and the azimuthal angle ¢. 


Fig. 5.3. The Bloch sphere: (a) the representation of an arbitrary state (solid red point) and the 
eigenstates of the Pauli matrices (dotted points), and (b, c) the two-level system’s evolution: (b) in a 
constant “field” ¢ directed along the z-axis, and (c) in a field of arbitrary orientation. 


In particular, the basis state T, described by Eq. (1) with at = 1 and a = 0, corresponds to the 
North Pole of the sphere (= 0), while the opposite state J, with a= 0 and a = 1, to its South Pole (6 
= 7). Similarly, the eigenstates + and < of the matrix o,, described by Eqs. (4.122), i.e. having at = 


4 If you need a reminder of why this condition is crucial, please revisit the discussion at the end of Sec. 1.6. Note 
also that the mutual phase shifts between different qubits are important, in particular, for quantum information 
processing (see Sec. 8.5 below), so that most discussions of these applications have to start from Eq. (10) rather 
than Eq. (11). 

5 This representation was suggested in 1946 by the same Felix Bloch who has pioneered the energy band theory 
discussed in Chapters 2-3. 
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1/V2 and a = +1/V2, correspond to the equator (6 = 7/2) points with, respectively, g = 0 and y= a. 
Two more special points (denoted in Fig. 3a as © and ®) are also located on the sphere’s equator, at O= 
m2 and g= 7/2; it is easy to check that they correspond to the eigenstates of the matrix o, (in the same 
z-basis). 

To understand why such mutually perpendicular location of these three special point pairs on the 
Bloch sphere is not occasional, let us plug Eqs. (11) into Eqs. (4.131)-(4.133) for the expectation values 
of the spin-/2 components. In terms of the Pauli vector operator (4.117), o = S/(h/2), the result is 


(c,) = sin Ocos@, (c, ) =sinOsing, (c,) =cos0, (5.12) 


showing that the radius vector of any representation point is just the expectation value of o. 


Now let us use Eq. (3) to see how does the representation point moves in various cases, ignoring 
the term bI — which, again, describes the offset of the total energy of the system relative to some 
reference level, and does not affect its dynamics. First of all, according to Eq. (4.158), in the case e = 0 
(when the Hamiltonian operator turns to zero, and hence the state vectors do not depend on time) the 
point does not move at all, and its position is determined by initial conditions, i.e. by the system’s 
preparation. If ec # 0, we may re-use some results of Sec. 4.6, obtained for the Pauli Hamiltonian 
(4.163a), which coincides with Eq. (3) if® 

eer eae (5.13) 
2 
In particular, if the field Z, and hence the vector ec, is directed along the z-axis and is time-independent, 


Eqs. (4.170) and (4.173)-(4.174) show that the representation point (o) on the Bloch sphere rotates 
within a plane normal to this axis (see Fig. 3b) with the angular velocity 


qd? _o_ me 
dt h 
Almost evidently, since the selection of the coordinate axes is arbitrary, this picture should 
remain valid for any orientation of the vector c, with the representation point rotating, on the Bloch 
sphere, around it direction, with the angular speed |Q| = 2c/h — see Fig. 3c. This fact may be proved 
using any picture of the quantum dynamics, discussed in Sec. 4.6. Actually, the reader may already have 
done that by solving Problems 4.25 and 4.26, just to see that even for the particular, simple initial state 
of the system (7), the final results for the Cartesian components of the vector (o) are somewhat bulky. 
However, this description may be readily simplified, even for arbitrary time dependence of the “field” 
vector ¢e(f) in Eq. (3), using the (geometric) vector language. 


(5.14) 


Indeed, let us rewrite Eq. (3) (again, with b = 0) in the operator form, 
H =c(t)-6, (5:15) 


valid in an arbitrary basis. According to Eq. (4.199), the corresponding Heisenberg equation of motion 
for the j'" Cartesian components of the vector-operator 6 (which does not depend on time explicitly, so 
that 06'/0t =0) is 


6 This correspondence justifies using the use of term “field” for the vector e. 
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ne. |Z [2.00 4]=] 6.2.0, | ESO: | (5.16) 


Now using the commutation relations (4.155), which remain valid in any basis and in any picture of time 
evolution,’ we get 


IND, => CCF pe pes (5.17) 


where j”’ is the index, or the same set {1, 2, 3}, complementary to 7 andj’ 7” #/, 7’), and 6,” is the 
Levi-Civita symbol.’ But it is straightforward to verify that the usual vector product of two 3D vectors 
may be represented in a similar Cartesian-component form: 


a : 
(axb), =|a, a, a,| = 247 8 i ; (5.18) 
Bie Dy. BS) 


As aresult, Eq. (17) may be rewritten in a vector form — or rather several equivalent forms: 
x é ‘ é x ae: by 2 " 
iho; = 2ile(t) x 6], i.e. ihe = 2ie(t)x6, or 6= cx 6, or 6=Q(t)x6, (5.19) 


where the vector Q is defined as 
AnQ(t) = 2e(r) (5.20) 


— an evident generalization of Eq. (14). As we have seen in Sec. 4.6, any linear relation between two 
Heisenberg operators is also valid for the expectation values of the corresponding observables, so that 
the last form of Eq. (19) yields: 


(6) = Q(t)x(o). (5.21) 


But this is the well-known kinematic formula!® for the rotation of a constant-length classical 3D 
vector (o) around the instantaneous direction of the vector Q(¢), with the instantaneous angular velocity 
Q(t). So, the time evolution of the representation point on the Bloch sphere is quite simple, especially in 
the case of a time-independent c, and hence Q — see Fig. 3c.!! Note that it is sufficient to turn off the 
field to stop the precession instantly. (Since Eq. (21) is the first-order differential equation, the 


7 Indeed, if some three operators in the Schrédinger picture are related as [ A,, Be] = C, , then according to Eq. 
(4.190), in the Heisenberg picture: 


[4,.8,]=[a' 4,0,0° Ba) 20' Aaa" Baa" Baa Aa a" [A Bi =a Ca = Cy. 
8 See, e.g., MA Eq. (9.2). Note that in Eqs. (17)-(18) and similar expressions below, the condition j” ¥ j, 7’ may be 
(and frequently is) replaced by the summation over not only 7’, but also 7”, in their right-hand sides. 
9 It is also easy to verify that in the particular case Q = On., Eqs. (19) are reduced, in the z-basis, to Eqs. (4.200) 
for the spin-’2 vector matrix S = (f/2)o. 
10 See, e.g., CM Sec. 4.1, in particular Eq. (4.8). 
'l The bulkiness of the solutions of Problems 4.25 and 4.26 (which were offered just as useful exercises in 
quantum dynamic formalisms) reflects the awkward expression of the resulting circular motion of the vector (o) 
(see Fig. 3c) via its Cartesian components. 
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representation point has no effective inertia.!2) Hence, changing the direction and the magnitude of the 
effective external field, it is possible to drive the representation point of a two-level system from any 
initial position to any final position on the Bloch sphere, i.e. make the system take any of its possible 
quantum states. 


In the particular case of a spin-’2 in a magnetic field A(A), it is more customary to use Eqs. (13) 
and (20) to rewrite Eq. (21) as the following equation for the expectation value of the spin vector S = 
(h/2)o: 

(S) = 7(S)xB(0). (5.22) 


As we know from the discussion in Chapter 4, such a classical description of the spin’s evolution does 
not give a full picture of the quantum reality; in particular, it does not describe the possible large 
uncertainties of its components — see, e.g., Eqs. (4.135). The situation, however, is different for a 
collection of N >> 1 similar, non-interacting spins, initially prepared to be in the same state — for 
example by polarizing all spins with a strong external field Bp, at relatively low temperatures T, with 
kgT << yAoh. (A practically important example of such a collection is a set of nuclear spins in 
macroscopic condensed-matter samples, where the spin interaction with each other and the environment 
is typically very small.) For such a collection, Eq. (22) is still valid, while the relative uncertainty of the 
resulting sample’s magnetization M = n(m) = nx) (where n = N/V is the spin density) is proportional 
to 1/N'? << 1. Thus, the evolution of magnetization may be described, with good precision, by the 
essentially classical equation (valid for any spin, not necessarily spin-/): 


M=Mx A(t). (5.23) 


This equation, or the equivalent set of three Bloch equations'> for its Cartesian components, 
with the right-hand side augmented with small terms describing the effects of dephasing and relaxation 
(to be discussed in Chapter 7), is used, in particular, to describe the magnetic resonance, taking place 
when the frequency (4.164) of the spin’s precession in a strong dc magnetic field approaches the 
frequency of an additionally applied (and usually weak) ac field.'4 


5.2. The Ehrenfest theorem 
In Sec. 4.7, we have derived all the basic relations of wave mechanics from the bra-ket 
formalism, which will also enable us to get some important additional results in that area. One of them is 
a pair of very interesting relations, together called the Ehrenfest theorem. To derive them, for the 
simplest case of 1D orbital motion, let us calculate the following commutator: 
It, 6? |= 80.8, — B,0,2 (5.24) 
Let us apply the commutation relation (4.238) in the following form: 


Sp, = p,.&+inl, (5.25) 


12 This is also true for the classical angular momentum L at its torque-induced precession — see, e.g., CM Sec. 4.5. 
13 They were introduced by F. Bloch in the same 1946 paper as the Bloch-sphere representation. 
' The quantum theory of this effect will be discussed in the next chapter. 
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to the first term of the right-hand side of Eq. (24) twice, with the goal to move the coordinate operator to 
the rightmost position: 


Sp. p, =(—p,4+ini)p, = p.%p, + inp, = p,(p,%+ ini)+ inp, = p._pS+2inp,. (5.26) 
The first term of this result cancels with the last term of Eq. (24), so that the commutator becomes quite 
simple: 
[2, 6? |= 2p, (5.27) 
Let us use this equality to calculate the Heisenberg-picture equation of motion of the operator x , 


by applying the general Heisenberg equation (4.199) to the 1D orbital motion described by the 
Hamiltonian (4.237), but possibly with a more general, time-dependent potential energy U: 


d 1, «]_ 1 Pp. 
———— ,H =— Lene x,t 5.28 
dt in? | s "2m . | aa 


The potential energy operator is a function of the coordinate operator and hence, as we know, commutes 
with it. Thus, the right-hand side of Eq. (28) is _— to the commutator (27), and we get 

dx 

dt 


In this operator equality, we readily recognize the full analog of the classical relation between the 
particle’s momentum and is velocity. 


(5.29) 


Now let us see what a similar procedure gives for the momentum’s derivative: 


Tb) F 3k 


The kinetic energy operator commutes with the momentum operator and hence drops from the right- 
hand side of this equation. To calculate the remaining commutator of the momentum and potential 
energy, let us use the fact that any smooth (infinitely differentiable) function may be represented by its 
Taylor expansion: 


| (5.30) 


7 1 OU 44 
U(X,t) = LE a 


(5.31) 


where the derivatives of U may be understood as c-numbers (evaluated at x = 0, and the given time £), so 
that we may write 


uap]=> b,.8 |= SRS 88.88, |. 5.32a 
[b,.UG0]= ree = b,,4" |= aes aa ae i.4%,| (5.32a) 
Applying Eq. (25) & times to the last term in the parentheses, exactly as we did it in Eq. (26), we get 
: . 2] k os 00 1 , 
[p,.U(%)]=—- bE '=-ih>: aus as (5.32b) 
‘i Kk! Ox" tai (k —1)! Ox 


But the last sum is just the Taylor expansion of the derivative 0U/dx. Indeed, 
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oo k' ee) san Fi ioe) k 
“ 7 : (Ss =r => am, (5.33) 
Ox ek tox” \. OF ea K'! OX = a 1)! se 
where at the last step the summation index was changed from k’ to k— 1. As a result, we may rewrite Eq. 
(5.32b) as 
[b, .U(&,0]=—ih Kuan) ; (5.34) 
x 
so that Eq. (30) yields: 
Heisenberg db 
equation Px = 0G, f). (5.35) 
al dt Ox 


momentum 
This equation also coincides with the classical equation of motion! Moreover, averaging Eqs. (29) and 
(35) over the initial state (as Eq. (4.191) prescribes), we get similar results for the expectation values:!5 


Ehrenfest d(x) _ (p,) d(p,) _ (2). 


theorem ; 
Ox 


dt m dt 


(5.36) 


However, it is important to remember that the equivalence between these quantum-mechanical 
equations and similar equations of classical mechanics is superficial, and the degree of the similarity 
between the two mechanics very much depends on the problem. As one extreme, let us consider the case 
when a particle’s state, at any moment between fp and t, may be accurately represented by one, relatively 
Px-narrow wave packet. Then we may interpret Eqs. (36) as the equations of the essentially classical 
motion of the wave packet’s center, in accordance with the correspondence principle. However, even in 
this case, it is important to remember the purely quantum mechanical effects of non-zero wave packet 
width and its spread in time, which were discussed in Sec. 2.2. 


As an opposite extreme, let us revisit the “leaky” potential well discussed in Sec. 2.5 — see Fig. 
2.15. Since both the potential U(x) and the initial wavefunction of that system are symmetric relative to 
point x = 0 at all times, the right-hand sides of both Eqs. (36) identically equal zero. Of course, the result 
they predict (that the average values of the coordinate and the momentum stay equal to zero at all times) 
is correct, but this fact does not tell us much about the rich dynamics of the system: the finite lifetime of 
the metastable state, the formation of two wave packets, their waveform and propagation speed (see Fig. 
2.17), and about the insights the full solution gives for the quantum measurement theory and the 
system’s irreversibility. Another similar example is the energy band theory (Sec. 2.7), with its purely 
quantum effect of the allowed energy bands and forbidden energy gaps, of which Eqs. (36) give no clue. 


To summarize, the Ehrenfest theorem is important as an illustration of the correspondence 
principle, but its predictive power should not be exaggerated. 


5.3. The Feynman path integral 


As has been already mentioned, even within the realm of wave mechanics, the bra-ket language 
may simplify some calculations that would be very bulky using the notation used in Chapters 1-3. 
Probably the best example is the famous alternative, path-integral formulation of quantum mechanics.!° 


'5 The equation set (36) constitutes the Ehrenfest theorem, named after its author, P. Ehrenfest. 
16 This formulation was developed in 1948 by Richard Phillips Feynman. (According to his memories, this work 
was motivated by a “mysterious” remark by P. Dirac in his pioneering 1930 textbook on quantum mechanics.) 
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I will review this important concept, cutting one math corner for the sake of brevity.!’ (This shortcut 
will be clearly marked below.) 


Let us inner-multiply both parts of Eq. (4.157a), which is essentially the definition of the time- 
evolution operator, by the bra-vector of state x, 


(x|a(t)) = (x|a(t,t,)| a(t), (5.37) 


insert the identity operator before the ket-vector on the right-hand side, and then use the closure 
condition in the form of Eq. (4.252), with x’ replaced with xo: 


(x|ax(t)) = [ dba (x] 4 Ct5)|X0 )(%o |@(E0))- (5.38) 
According to Eq. (4.233), this equality may be represented as 
W,(x,0) = | dey (z|8G1,)|.)¥, rate) (5.39) 


Comparing this expression with Eq. (2.44), we see that the long bracket in this relation is nothing other 
than the 1D propagator, which was discussed in Sec. 2.2, 1.e. 


G(x,1,X)3f)) =| GE,)|¥0) + (5.40) 
Let me hope that the reader sees that this equality corresponds to the physical sense of the propagator. 


Now let us break the time segment [fo, ¢] into N (for the time being, not necessarily equal) parts, 
by inserting (NV — 1) intermediate points (Fig. 4) with 


by Ot Reis Ss Sh AF (5.41) 
and use the definition (4.157) of the time evolution operator to write 
Gi(t,t) = A(t ty.) (ty -asty-a)-AEyst A ysty)- (5.42) 


After plugging Eq. (42) into Eq. (40), let us insert the identity operator, again in the closure form 
(4.252), but written for x; rather than x’, between each two partial evolution operators including the time 
argument ¢;. The result is 


G(x, t3Xy to) = [tic aioe) dx, (x(t (Are ere Nad 


DE ysis typ) yea AX | stg) Xs GAB) 


The physical sense of each integration variable x; is the wavefunction’s argument at time f; — see Fig. 4. 


Fig. 5.4. Time partition and coordinate 
notation at the initial stage of the 
Feynman path integral’s derivation. 


'7 A more thorough discussion of the path-integral approach may be found in the famous text by R. Feynman and 
A. Hibbs, Quantum Mechanics and Path Integrals, first published in 1965. (For its latest edition by Dover in 
2010, the book was emended by D. Styler.) For a more recent monograph, which reviews more applications, see 
L. Schulman, Techniques and Applications of Path Integration, Wiley, 1981. 
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The key Feynman’s breakthrough was the realization that if all intervals are taken similar and 
sufficiently small, t — tj. = dt — 0, all the partial brackets participating in Eq. (43) may be expressed 
via the free-particle’s propagator, given by Eq. (2.49), even if the particle is not free, but moves in a 
stationary potential profile U(x). To show that, let us use either Eq. (4.175) or Eq. (4.181), which, for a 
small time interval dz, give the same result: 


P ee: 
ie +d.r)=exp|- 2 Ar = exp oe P dr+u(%)dr ; (5.44) 
h h\ 2m 


Generally, an exponent of a sum of two operators may be treated as that of c-number arguments, and in 
particular factored into a product of two exponents, only if the operators commute. (In this case, we can 
use all the standard algebra for the exponents of c-number arguments.) In our case, this is not so, 
because the operator p* /2m does not commute with <, and hence with U(x). However, it may be 


shown!8 that for an infinitesimal time interval dz, the non-zero commutator 
Dp 
[aruear #0, (5.45) 
2m 


proportional to (dz), may be ignored in the first, linear approximation in dz. As a result, we may 
factorize the right-hand side in Eq. (44) by writing 


m 


- AQ . 
u(t+dt,t) 4, 9 > oxo 8 irew{-Lucour}. (5.46) 


(This approximation is very much similar in spirit to the trapezoidal-rule approximation in the usual 1D 
integration,!9 which in also asymptotically impeachable.) 


Since the second exponential function on the right-hand side of Eq. (46) commutes with the 
coordinate operator, we may move it out of each partial bracket participating in Eq. (43), with U(x) 


turning into a c-number function: 
+ A2 
i 
exp; — — P dt 
h2m 


But the remaining bracket is just the propagator of a free particle, so that for it we may use Eq. (2.49): 


— 1/2 . 

1p m _m(dx) 
rie = ——— 5.48 
ne, | h 2m | ze (2) is | ao ee) 


As the result, the full propagator (43) takes the form 


N/2 
; m N|m(dx)? U(x 
G(x,6X5t)) =limy, 49 [ary [ary ..f dx "| of _ i ; Lael) (5.49) 


k=l 
N->oo 


u(t +dt,T) 


Cae x,) = Cane 


%) exp|- Hucar} (5.47) 


cane 


18 This is exactly the corner I am going to cut because a strict mathematical proof of this (intuitively evident) 
statement would take more time/space than I can afford. 
19 See, e.g., MA Eq. (5.2). 
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At N —> o and hence dr = (t — to)/N — 0, the sum under the exponent in this expression may be 
approximated with the corresponding integral: 


Nj dx i ¢| m( dx ‘ 
Si u(a) veo] de>! ie «| ut) (5.50) 


t 


and the expression in the square brackets is just the particle’s Lagrangian function “2° The integral of 
this function over time is the classical action $ calculated along a particular “path” x(z).?! As a result, 
defining the (1D) path integral as 


/2 
m 
a leper | dy oe! eas): (5.51a) 


[C)DLx@)] = lim iol 
N>o 


we can bring our result to the following (superficially simple) form: 
G(x,t;x,t)) = | expt sfx )}otsey. (5.51b) 


The name “path integral” for the mathematical construct (51a) may be readily explained if we 
keep the number JN of time intervals large but finite, and also approximate each of the enclosed integrals 
with a sum over M >> | discrete points along the coordinate axis — see Fig. 5a. 


a (a) (b) 


Fig. 5.5. Several 1D classical 
ay paths: (a) in the discrete 
approximation and (b) in the 

: Ss continuous limit. 


Then the path integral (51a) is the product of (V — 1) sums corresponding to different values of 
time z, each of them with M terms, each of those representing the function under the integral at a 
particular spatial point. Multiplying those (NV — 1) sums, we get a sum of (N — 1)M terms, each 
evaluating the function at a specific spatial-temporal point [x, zt]. These terms may be now grouped to 
represent all possible different continuous classical paths x[ 7] from the initial point [xo, fo] to the finite 
point [x, ¢]. It is evident that the last interpretation remains true even in the continuous limit NV, M@— oo — 
see Fig. 5b. 


Why does such path representation of the sum make sense? This is because in the classical limit 
the particle follows just a certain path, corresponding to the minimum of the action &. As a result, for 
all close trajectories, the difference (S — 5c) is proportional to the square of the deviation from the 


20 See, e.g., CM Sec. 2.1. 
21 See, e.g., CM Sec. 10.3. 
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classical trajectory. Hence, for a quasiclassical motion, with 5. >> h, there is a bunch of close 
trajectories, with (S — 52) << h, that give substantial contributions to the path integral. On the other 
hand, strongly non-classical trajectories, with (S — 521) >> h, give phases S/h rapidly oscillating from 


one trajectory to the next one, and their contributions to the path integral are averaged out.”? As a result, 
for a quasi-classical motion, the propagator’s exponent may be evaluated on the classical path only: 


G,« exp] 15, = oo: iE (+ vena (5.52) 


The sum of the kinetic and potential energies is the full energy E of the particle, that remains constant 
for motion in a stationary potential U(x), so that we may rewrite the expression under this integral as”? 


Et -ueo}ee=|a() Se (5.53) 
dt d 


2\dt T 


With this replacement, Eq. (52) yields 


Gy & ont fm ad exp|- aC “1, | = aot {no exp|- ~ Et “ty 1 (5.54) 


where p is the classical momentum of the particle. But (at least, leaving the pre-exponential factor alone) 
this is the WKB approximation result that was derived and studied in detail in Chapter 2! 


One may question the value of such a complicated calculation, which yields the results that could 
be readily obtained from Schrédinger’s wave mechanics. Feynman’s approach is indeed not used too 
often, but it has its merits. First, it has an important philosophical (and hence heuristic) value. Indeed, 
Eq. (51) may be interpreted by saying that the essence of quantum mechanics is the exploration, by the 
system, of all possible paths x(7), each of them classical-like, in the sense that the particle’s coordinate x 
and velocity dx/dt are exactly defined simultaneously at each point. The resulting contributions to the 
path integral are added up coherently to form the actual propagator G, and via it, the final probability W 
o | G|° of the particle’s propagation from [Xxo, fo] to [x, ¢]. As the scale of the action © of the motion 


decreases and becomes comparable to , more and more paths produce substantial contributions to this 
sum, and hence to W, providing a larger and larger difference between the quantum and classical 
properties of the system. 


Second, the path integral provides a justification for some simple explanations of quantum 
phenomena. A typical example is the quantum interference effects discussed in Sec. 3.1 — see, e.g., Fig. 
3.1 and the corresponding text. At that discussion, we used the Huygens principle to argue that at the 
two-slit interference, the WKB approximation might be restricted to contributions from two paths that 
pass through different slits, but otherwise consisting of straight-line segments. To have another look at 


22 This fact may be proved by expanding the difference (5 - 52) in the Taylor series in the path variation (leaving 
only the leading quadratic terms) and working out the resulting Gaussian integrals. This integration, together with 
the pre-exponential coefficient in Eq. (Sla), gives exactly the pre-exponential factor that we have 
already found refining the WKB approximation in Sec. 2.4. 

23 The same trick is often used in analytical classical mechanics — say, for proving the Hamilton principle, and for 

the derivation of the Hamilton — Jacobi equations (see, e.g., CM Secs. 10.3-4). 


Chapter 5 Page 13 of 48 


Essential Graduate Physics QM: Quantum Mechanics 


that assumption, let us generalize the path integral to multi-dimensional geometries. Fortunately, the 


simple structure of Eq. (51b) makes such generalization virtually evident: 
3D 
t propagator 


G(r, 65%) to) = Jew] - strc )pomren, S= eG ear = | rs nln 


to 


r 
I 


where the definition (Sla) of the path integral should be also modified correspondingly. (I will not go 
into these technical details.) For the Young-type experiment (Fig. 3.1), where a classical particle could 
reach the detector only after passing through one of the slits, the classical paths are the straight-line 
segments shown in Fig. 3.1, and if they are much longer than the de Broglie wavelength, the propagator 
may be well approximated by the sum of two integrals of “dz = ip(r)-dr/ h — as it was done in Sec. 3.1. 


Last but not least, the path integral allows simple solutions to some problems that would be hard 
to obtain by other methods. As the simplest example, let us consider the problem of tunneling in multi- 
dimensional space, sketched in Fig. 6 for the 2D case — just for the graphics’ simplicity. Here, the 
potential profile U(x, v) has a saddle-like shape. (Another helpful image is a mountain path between two 
summits, in Fig. 6 located on the top and at the bottom of the shown region.) A particle of energy E may 
move classically in the left and right regions with U(x, y) < E, but if F is not sufficiently high, it can pass 
from one of these regions to another one only via the quantum-mechanical tunneling under the pass. Let 
us calculate the transparency of this potential barrier in the WKB approximation, ignoring the possible 
pre-exponential factor. *4 


: Ua U,<£E 


Fig. 5.6. A saddle-type 2D 
potential profile and the instanton 
trajectory of a particle of energy 
E (schematically). 


According to the evident multi-dimensional generalization Eq. (54), for the classically forbidden 
region, where E < U(x, y), and hence p(r)/f = ix(r), the contributions to the propagator (55) are 
proportional to 

. r 
e! exp| + B- 1h where / = [x(r)-ar, (5.56) 
Th 


where «= « may be calculated just in the 1D case — cf. Eq. (2.97): 


n° (r) 


=U(r)-E. (5.57) 
2m 


24 Actually, one can argue that the pre-exponential factor should be close to 1, just like in Eq. (2.117), especially 
if the potential is smooth, in the sense of Eq. (2.107), in all spatial directions. (Let me remind the reader that for 
most practical applications of quantum tunneling, the pre-exponential factor is of minor importance.) 
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Hence the path integral in this region is much simpler than in the classically allowed region, 
because the spatial exponents are purely real and there is no complex interference between them. Due to 
the minus sign before J in the exponent (56), the /argest contribution to G evidently comes from the 
trajectory (or a narrow bundle of close trajectories) for which the integral J has the smallest value, so 
that the barrier transparency may be calculated as 


r 
F =|Gl xe% = o0 2 fac(r’): a (5.58) 


Yr 


where r and ro are certain points on the opposite classical turning-point surfaces: U(r) = U(ro) = E — see 
Fig. 6. 


Thus the barrier transparency problem is reduced to finding the trajectory (including the points r 
and ro) that connects the two surfaces and minimizes the functional J. This is of course a well-known 
problem of the calculus of variations,?> but it is interesting that the path integral provides a simple 
alternative way of solving it. Let us consider an auxiliary problem of particle’s motion in the potential 
profile U;,,(r) that is inverted relative to the particle’s energy E, i.e. is defined by the following equality: 

UU. (7)-E=E-U(r). (5.59) 


Inv 


As was discussed above, at fixed energy £, the path integral for the WKB motion in the classically 
allowed region of potential Uin,(x, y) (that coincides with the classically forbidden region of the original 
problem) is dominated by the classical trajectory corresponding to the minimum of 


Soy = [Pin (de! =h]k,,, (0): dr, (5.60) 
where kiny should be determined from the WKB relation 
hk: (r) 


Inv 


=E-U,,,(r). (5.61) 
2m 

But comparing Eqs. (57), (59), and (61), we see that kiny = K at each point! This means that the tunneling 
path (in the WKB limit) corresponds to the classical (so-called instanton*®) trajectory of the same 
particle moving in the inverted potential Uin,(r). If the initial point ro is fixed, this trajectory may be 
readily found by the means of classical mechanics. (Note that the initial kinetic energy, and hence the 
initial velocity of the instanton launched from point ro should be zero because by the classical turning 
point definition, Uin(1%o) = U(ro) = E.) Thus the problem is further reduced to a simpler task of 
maximizing the transparency (58) by choosing the optimal position of ro on the equipotential surface 
U(ro) = E — see Fig. 6. Moreover, for many symmetric potentials, the position of this point may be 
readily guessed even without calculations — as it is in Problems 6 and 7, left for the reader’s exercise. 


Note that besides the calculation of the potential barrier’s transparency, the instanton trajectory 
has one more important implication: the so-called traversal time % of the classical motion along it, from 


25 For a concise introduction to the field see, e.g., I. Gelfand and S. Fomin, Calculus of Variations, Dover, 2000, 
or L. Elsgolc, Calculus of Variations, Dover, 2007. 

26 In the quantum field theory, the instanton concept may be formulated somewhat differently, and has more 
complex applications — see, e.g. R. Rajaraman, Solitons and Instantons, North-Holland, 1987. 
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the point ro to the point r, in the inverted potential defined by Eq. (59), plays the role of the most 
important (though not the only one) time scale of the particle’s tunneling under the barrier.?” 


5.4. Revisiting harmonic oscillator 


Let us return to the 1D harmonic oscillator, now understood as any system, regardless of its 
physical nature, described by the Hamiltonian (4.237) with the potential energy (2.111): 


n2 242 
ise. a OU. (5.62) 
2m 2 


In Sec. 2.9 we have used a “brute-force” (wave-mechanics) approach to analyze the eigenfunctions 
y,(x) and eigenvalues £,, of this Hamiltonian, and found that, unfortunately, this approach required 
relatively complex mathematics, which does not enable an easy calculation of its key characteristics. 
Fortunately, the bra-ket formalism helps to make such calculations. 


First, introducing normalized (dimensionless) operators of coordinates and momentum:?8 


an 
Il 
Wp 
Il 
ip 


(5.63) 


where Xo = (h/ma@)"" ? is the natural coordinate scale discussed in detail in Sec. 2.9, we can represent the 


Hamiltonian (62) in a very simple and x <> p symmetric form: 


f= "2u(@ 2”), (5.64) 


This symmetry, as well as our discussion of the very similar coordinate and momentum representations 
in Sec. 4.7, hints that much may be gained by treating the operators € and ¢ on equal footing. Inspired 
by this clue, let us introduce a new operator 


(5.65a) 


Since both operators é and a correspond to real observables, i.e. have real eigenvalues and hence are 
Hermitian (self-adjoint), the Hermitian conjugate of the operator a is simply its complex conjugate: 


fee Sg . (5.65b) 


Because of the reason that will be clear very soon, 4 
annihilation operators. 


+ 


a 


+ 


and a (in this order!) are called the creation and 


27 For more on this interesting issue see, e.g., M. Buttiker and R. Landauer, Phys. Rev. Lett. 49, 1739 (1982), and 
references therein. 

28 This normalization is not really necessary, it just makes the following calculations less bulky — and thus more 
aesthetically appealing. 
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Now solving the simple system of two linear equations (65) for é and é , we get the following 
reciprocal relations: 


, atat , a-al nh) a+at | yi2 aa! 

a= » $= » Le x= , p=(hmo 
V2 V2i ee) aa OY 

Our Hamiltonian (64) includes squares of these operators. Calculating them, we have to be careful to 


avoid swapping the new operators, because they do not commute. Indeed, for the normalized operators 
(63), Eq. (2.14) gives 


(5.66) 


MQ) 


Els [%. p]= il, (5.67) 


so that Eqs. (65) yield 


ana? |= e+i2}E-2]- SE LE =F, 6.68) 


2 


With such due caution, Eq. (66) gives 


ae ee ee ee 5 ore ae er 
ag -H{@ +4) +4a! vata] i a +4) —aat -ata} (5.69) 
Plugging these expressions back into Eq. (64), we get 
iat = “Oo aa vata). (5.70) 


This expression is elegant enough, but may be recast into an even more convenient form. For 
that, let us rewrite the commutation relation (68) as 


aa' =a'as, (5.71) 
and plug it into Eq. (70). The result is 


- 5 . , : 
A= > (24'a+7)=no,[ +57), (5.72) 


where, in the last form, one more (evidently, Hermitian) operator, 


has been introduced. Since, according to Eq. (72), the operators H and N differ only by the addition of 
the identity operator and multiplication by a c-number, these operators commute. Hence, according to 
the general arguments of Sec. 4.5, they share a set of stationary eigenstates n (they are frequently called 
the Fock states), and we can write the standard eigenproblem (4.68) for the new operator as 


A 


N 


n)=N 


n), (5.74) 


n 


where NV, are some eigenvalues that, according to Eq. (72), determine also the energy spectrum of the 
oscillator: 
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1 
E,= hal N, +3]. (5.75) 
So far, we know only that all eigenvalues N,, are real; to calculate them, let us carry out the 


following calculation — splendid in its simplicity and efficiency. Consider the result of the action of the 
operator N on the ket-vector 4 In). Using the definition (73) and then the associative rule of the bra-ket 


formalism, we may write 
N(a"|n)) =(ata)(at|n))=a"(aa* Jn). (5.76) 
Now using the commutation relation (71), and then Eq. (74), we may continue as 


a'( aa" Jn at(ata +i)n) = a' (8+ i)n) = at (wv, +1\n)=(N, 2G 


n)} (5.77) 
For clarity, let us summarize the result of this calculation: 
N(at|n)) =(N, +1)(4"|n)), (5.78) 
Performing a similar calculation for the operator 4, we get a similar formula: 
W(4|n))=(N, -1)(4|n)). (5.79) 


It is time to stop calculations for a minute, and translate these results into plain English: if |7) is 
an eigenket of the operator N with the eigenvalue N,, then @‘|n) and G|n) are also eigenkets of that 
operator, with the eigenvalues (NV, + 1), and (NV, — 1), respectively. This statement may be vividly 
represented on the so-called /adder diagram shown in Fig. 7. 


eigenket ves eigenvalue of N 
te al 
a n) N,+1 
a) [at di } 
n N 
P [at a ) Fig. 5.7. The “ladder diagram” of eigenstates of a 1D 
a n) = NG harmonic oscillator. Arrows show the actions of the 
( Ae a } creation and annihilation operators on the eigenstates. 


The operator d* moves the system one step up this ladder, while the operator @ brings it one 
step down. In other words, the former operator creates a new excitation of the system,?? while the latter 
operator kills (“annihilates”) such excitation.3° On the other hand, according to Eq. (74) inner-multiplied 
by the bra-vector (n|, the operator N does not change the state of the system, but “counts” its position 
on the ladder: 


2° For electromagnetic field oscillators, such excitations are called photons; for mechanical wave oscillators, 
phonons, etc. 
30 This is exactly why @‘ is called the creation operator, and @, the annihilation operator. 
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(n|N|n) = (n|N,|n) = N,. (5.80) 


This is why WN is called the number operator, in our current context meaning the number of the 
elementary excitations of the oscillator. 


This calculation still needs completion. Indeed, we still do not know whether the ladder shown in 
Fig. 7 shows all eigenstates of the oscillator, and what exactly the numbers W,, are. Fascinating enough, 
both questions may be answered by exploring just one paradox. Let us start with some state n (read a 
step of the ladder), and keep going down the ladder, applying the operator ad again and again. According 
to Eq. (79), at each step the eigenvalue N,, is decreased by one, so that eventually it should become 
negative. However, this cannot happen, because any actual eigenstate, including the states represented 
by kets |d) =a |n) and |n), should have a positive norm — see Eq. (4.16). Comparing the norms, 


lel =(la), af =@ 


we see that both of them cannot be positive simultaneously if N,, is negative. 


ata\n) = (n|N\n) = N,(n|n), (5.81) 


To resolve this paradox let us notice that the action of the creation and annihilation operators on 
the stationary states n may consist of not only their promotion to an adjacent step of the ladder diagram 
but also by their multiplication by some c-numbers: 


a\n) = A,|n-1), a’ 


n)=A',|n+1). (5.82) 


(The linear relations (78)-(79) clearly allow that.) Let us calculate the coefficients A, assuming, for 
convenience, that all eigenstates, including the states n and (n —1), are normalized: 


at a ly js N 
(n|n) =1, (n=I|n—1) = (n= n) = - (n|N|n) = = (n|n) =1. (5.83) 
n On A,A, A, A, 
From here, we get | 4, |=(N,)!”, ice. . 
an) = Nie"? |n—-1), (5.84) 


where @, is an arbitrary real phase. Now let us consider what happens if all numbers JN, are integers. 
(Because of the definition of N,, given by Eq. (74), it is convenient to call these integers n, i.e. to use 
the same letter as for the corresponding eigenstate.) Then when we have come down to the state with n 
= 0, an attempt to make one more step down gives 


a|0) = 0|-1). (5.85) 
But according to Eq. (4.9), the state on the right-hand side of this equation is the “null-state’’, i.e. does 
not exist.3! This gives the (only known :-) resolution of the state ladder paradox: the ladder has the 
lowest step with NV, =n =0. 


As a by-product of our discussion, we have obtained a very important relation NV, = n, which 
means, in particular, that the state ladder shown in Fig. 7 includes a// eigenstates of the oscillator. 


3! Please note again the radical difference between the null-state on the right-hand side of Eq. (85) and the state 
described by the ket-vector |0) on the left-hand side of that relation. The latter state does exist and, moreover, 
represents the most important, ground state of the system, with n = 0 — see Eqs. (2.274)-(2.275). 
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Plugging this relation into Eq. (75), we see that the full spectrum of eigenenergies of the harmonic 
oscillator is described by the simple formula 


E. =hoy{n+2} n=0, 1, 2..., (5.86) 


which was already discussed in Sec. 2.9. It is rather remarkable that the bra-ket formalism has allowed 


us to derive it without calculating the corresponding (rather cumbersome) wavefunctions y,,(x) — see 
Eqs. (2.284). 

Moreover, this formalism may be also used to calculate virtually any matrix element of the 
oscillator, without using y,(x). However, to do that, we should first calculate the coefficient A’, 
participating in the second of Eqs. (82). This may be done similarly to the above calculation of A,; 


alternatively, since we already know that |A,| = (N,)'” =n", we may notice that according to Eqs. (73) 
and (82), the eigenproblem (74), which in our new notation for N,, becomes 


Nn) = n\n), (5.87) 
may be rewritten as 


n\n) = a! a|n) = 4" A,|n-1) = AA, 


n*~n-l 


n). (5.88) 


Comparing the first and the last form of this equality, we see that |A ’,.1| =n/|A,| =n'”, so that A’, = (n + 
1)'’exp(ig,,’). Taking all phases gy, and g,’ equal to zero for simplicity, we may spell out Eqs. (82) as?2 


Fock state 


fe) 89) ladder 


Now we can use these formulas to calculate, for example, the matrix elements of the operator x 
in the Fock state basis: 


(fn) = En) = (n'a a ln) =A (eran) + natn) com 
=e [a(n -1)+(n+1)"?{n'|n—1)] 
Taking into account the Fock state orthonormality: 
(n' n) =O n> (5.91) 


this result becomes 


Coordinate’s 


+(n+1)'6,.,4). |(5.92) matrix 


elements 


io. 


n',n-1 


Acting absolutely similarly, for the momentum’s matrix elements we get a similar expression: 


1/2 
(n'|B|n) = (ee) [-0"75,,,) +241)" 5y 441 (5.93) 


32 A useful mnemonic rule for these key relations is that the c-number coefficient in any of them is equal to the 
square root of the /argest number of the two states it relates. 
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Hence the matrices of both operators in the Fock-state basis have only two diagonals, adjacent to the 
main diagonal; all other elements (including the main-diagonal ones) are zeros. 


The matrix elements of higher powers of these operators, as well as their products, may be 
handled similarly, though the higher the power, the bulkier the result. For example, 


(n'[e2|n) = (n'a) = Yo (n' | n"\(n"|3 |) 
sas = 
7 = Se Sag 41) Sea (e728, FHA) Odea (5.94) 
n"=0 
2 
= Ln nPop a +H + +2 7p + On + Dy 
For applications, the most important of these matrix elements are those on its main diagonal: 
2 
(x?) = (n|é*|n) = Qn +1), (5.95) 


This expression shows, in particular, that the expectation value of the oscillator’s potential energy in the 


n" Fock state is 
2 2:22 
(gp eae (ig?) <2 0 nee es ves (5.96) 
2 2 2 2, 2 


This is exactly one-half of the total energy (86) of the oscillator. As a sanity check, an absolutely similar 
calculation for the momentum squared, and hence for the kinetic energy p*/2m, yields 


2 


(p°)=(olb*|n)= (megs) (+5 ] = Amen| n+) saa (2 \=20 [n+5) 65.97) 


m 


i.e. both partial energies are equal to £,,/2, just as in a classical oscillator.*3 


Note that according to Eqs. (92) and (93), the expectation values of both x and p in any Fock 
state are equal to zero: 


(x) = (n ||) = 0, (p) = (n|p|n) =(), (5.98) 


This is why, according to the general Eqs. (1.33)-(1.34), the results (95) and (97) also give the variances 
of the coordinate and the momentum, i.e. the squares of their uncertainties, (dx)” and (p)’. In particular, 
for the ground state (7 = 0), these uncertainties are 


h 1/2 h 1/2 
b= pe ) p= Meets of mt) (5.99) 


2MQ, 2 2 


In the theory of precise measurements (to be reviewed in brief in Chapter 10), these expressions are 
often called the standard quantum limit. 


33 Still note that operators of the partial (potential and kinetic) energies do not commute with either each other or 
with the full-energy (Hamiltonian) operator, so that the Fock states n are not their eigenstates. This fact maps on 
the well-known oscillations of these partial energies (with the frequency 2@) in a classical oscillator, at the full 
energy staying constant. 
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5.5. Glauber states and squeezed states 


There is a huge difference between a quantum stationary (Fock) state of the oscillator and its 
classical state. Indeed, let us write the well known classical equations of motion of the oscillator (using 
capital letters to distinguish classical variables from the arguments of quantum wavefunctions): 34 

ae nso (5.100) 
m Ox 
On the so-called phase plane, with the Cartesian coordinates x and p, these equations describe a 
clockwise rotation of the representation point {X(f), P(‘)} along an elliptic trajectory starting from the 
initial point {X(0), P(O)}. (The normalization of the momentum by m@p, similar to the one performed by 
the second of Eqs. (63), makes this trajectory pleasingly circular, with a constant radius equal to the 
oscillations amplitude A, corresponding to the constant full energy 


ny) 


Mo 


Ea with 2 -beof +] 22] -const= [C0 +] | » (5.101) 
2 MO, 


determined by the initial conditions — see Fig. 8.) 


p/moe, 


Fig. 5.8. Representations of various states of a harmonic 
oscillator on the phase plane. The bold black point 
represents a classical state with the complex amplitude 
a, with the dashed line showing its trajectory. The (very 
imperfect) classical images of the Fock states with n = 0, 
1, and 2 are shown in blue. The blurred red spot is the 
(equally schematic) image of the Glauber state a. 
Finally, the magenta elliptical spot is a classical image of 
a squeezed ground state — see below. Arrows show the 
direction of the states’ evolution in time. 


n=2 


For the forthcoming comparison with quantum states, it is convenient to describe this classical 
motion by the following dimensionless complex variable 


1 
2x, 


which is essentially the standard complex-number representation of the representing point’s position on 
the 2D phase plane, with |a | = .A/V2xo. With this definition, Eqs. (100) are conveniently merged into one 
equation, 


a(t)= 


[xo 20) (5.102) 


MQ) 


a =-ia,a, (5.103) 


34 If Eqs. (100) are not evident, please consult a classical mechanics course — e.g., CM Sec. 3.2 and/or Sec. 10.1. 
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with an evident, very simple solution 
a(t) = a(0)exp{-ia,t}, (5.104) 


where the constant a(0) may be complex, and is just the (normalized) classical complex amplitude of 
oscillations.3> This equation describes sinusoidal oscillations of both X(t) « Re[a(t)] and P « Im[ a(n], 
with a phase shift of 7/2 between them. 


On the other hand, according to the basic Eq. (4.161), the time dependence of a Fock state, as of 
a stationary state of the oscillator, is limited to the phase factor exp {-iE,t/h}. This factor drops out at the 
averaging (4.125) for any observable. As a result, in this state the expectation values of x, p, or of any 
function thereof are time-independent. (Moreover, as Eqs. (98) show, (x) = (p) = 0.) Taking into account 
Eqs. (96)-(97), the closest (though very imperfect) geometric image*® of such a state on the phase plane 
is a static circle of the radius A, = xo(2n + 1)", along which the wavefunction is uniformly spread — see 
the blue rings in Fig. 8. For the ground state (n = 0), with the wavefunction (2.275), a better image may 
be a blurred round spot, of a radius ~xo, at the origin. (It is easy to criticize such blurring, intended to 
represent the non-vanishing spreads (99), because it fails to reflect the fact that the total energy of the 
oscillator in the state, Eo = h@/2 1s defined exactly, without any uncertainty.) 


So, the difference between a classical state of the oscillator and its Fock state n is very profound. 
However, the Fock states are not the only possible quantum states of the oscillator: according to the 
basic Eq. (4.6), any state described by the ket-vector 


l@)= 4, 
n=0 


with an arbitrary set of (complex) c-numbers q@,, is also its legitimate state, subject only to the 
normalization condition (@|q@) = 1, giving 


n) (5.105) 


> 


n=0 


= (5.106) 


It is natural to ask: could we select the coefficients @, in such a special way that the state properties 
would be closer to the classical one; in particular the expectation values (x) and (p) of the coordinate and 
momentum would evolve in time as the classical values X(t) and P(t), while the uncertainties of these 
observables would be, just as in the ground state, given by Eqs. (99), and hence have the smallest 
possible uncertainty product, dxép = hi/2. Let me show that such a Glauber state,’ which is 


35 See, e.g., CM Chapter 5, especially Eqs. (5.4). 

36 | have to confess that such geometric mapping of a quantum state on the phase plane [x, p] is not exactly 
defined; you may think about colored areas in Fig. 8 as the regions of the observable pairs {x, p} most probably 
obtained in measurements. A quantitative definition of such a mapping will be given in Sec. 7.3 using the Wigner 
function, though, as we will see, even such imaging has certain internal contradictions. Still, such cartoons as Fig. 
8 have a substantial heuristic value, provided that their limitations are kept in mind. 

37 Named after Roy Jay Glauber who studied these states in detail in the mid-1965s, though they had been 
discussed in brief by Ervin Schrédinger as early as in 1926. Another popular adjective, “coherent”, for the 
Glauber states is very misleading, because a// quantum states of a// systems we have studied so far (including the 
Fock states of the harmonic oscillator) may be represented as coherent (pure) superpositions of the basis states. 
This is why I will not use this term for the Glauber states. 
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schematically represented in Fig. 8 by a blurred red spot around the classical point {X(d), P()}, is indeed 
possible. 


Conceptually the simplest way to find the corresponding coefficients @, would be to calculate 
(x), (p), Ox, and dp for an arbitrary set of @,, and then try to optimize these coefficients to reach our 
goal. However, this problem may be solved much easier using wave mechanics. Indeed, let us consider 
the following wavefunction: 


(5.107) 


Its comparison with Eqs. (2.275) shows that this is just the ground-state wavefunction, but with the 
center shifted from the origin into the classical point {X(4), P(f)}. A straightforward (though a bit bulky) 
differentiation over x and ¢ shows that it satisfies the oscillator’s Schrédinger equation, provided that the 
c-number functions X(f) and P(t) obey the classical equations (100). Moreover, a similar calculation 
shows that the wavefunction (107) also satisfies the Schrédinger equation of an oscillator under the 
effect of a pulse of a classical force F(t), provided that the oscillator initially was in its ground state, and 
that the classical evolution law {X(t), P(t)} in Eq. (107) takes this force into account.3* Since for many 
experimental implementations of the harmonic oscillator, the ground state may be readily formed (for 
example, by providing a weak coupling of the oscillator to a low-temperature environment), the Glauber 
state is usually easier to form than any Fock state with n > 0. This is why the Glauber states are so 
important and deserve much discussion. 


In such a discussion, there is a substantial place for the bra-ket formalism. For example, to 
calculate the corresponding coefficients in the expansion (105) by wave-mechanical means, 


ar, (t) = (n\a(t)) = [ade(n|x)(x]a(0) = fy, 2) Ye(x0dr, (5.108) 


we would need to use not only the simple Eq. (107), but also the Fock state wavefunctions y,(x), which 
are not very appealing — see Eq. (2.284) again. Instead, this calculation may be readily done in the bra- 
ket formalism, giving us one important byproduct result as well. 


Let us start by expressing the double shift of the ground state (by X and P), which has led us to 
Eq. (107), in the operator language. Forgetting about the P for a minute, let us find the translation 


operator ie that would produce the desired shift of an arbitrary wavefunction y(x) by a c-number 
distance X along the coordinate argument x. This means 


7 W(x) =y(x-X). (5.109) 


Representing the wavefunction yas the standard wave packet (4.264), we see that 


7 w(x) = Gaye lor) exp a hip = aay J) oc exp|- al exp| hap . (5.110) 


38 For its description, it is sufficient to solve Eqs. (100), with F(t) added to the right-hand side of the second of 
these equations. 
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Hence, the shift may be achieved by the multiplication of each Fourier component of the packet, with 
the momentum p, by exp{-ipX/h}. This gives us a hint that the general form of the translation operator, 
valid in any representation, should be 


. AX 
Ty = exp) 1 (5.111) 


The proof of this formula is provided merely by the fact that, as we know from Chapter 4, any operator 
is uniquely determined by the set of its matrix elements in any full and orthogonal basis, in particular the 
basis of momentum states p. According to Eq. (110), the analog of Eq. (4.235) for the p-representation, 
applied to the translation operator (which is evidently local), is 


| ap(p|7.|P')o(p')= exp i PX oD), (5.112) 


so that the operator (111) does exactly the job we need it to. 


The operator that provides the shift of momentum by a c-number P is absolutely similar — with 
the opposite sign under the exponent, due to the opposite sign of the exponent in the reciprocal Fourier 
transform, so that the simultaneous shift by both XY and P may be achieved by the following translation 
operator: 

n Px — pX 
7 = expr A= PX), (5.113) 


As we already know, for a harmonic oscillator the creation-annihilation operators are more natural, so 
that we may use Eqs. (66) to recast Eq. (113) as 


An 


i= exp adi — aa. so that on = expla“ aa) (5.114) 


a 


where a@ (which, generally, may be a function of time) is the c-number defined by Eq. (102). Now, 
according to Eq. (107), we may form the Glauber state’s ket-vector just as 


la) =7|0). (5.115) 


This formula, valid in any representation, is very elegant, but using it for practical calculations 
(say, of the expectation values of observables) is not too easy because of the exponent-of-operators form 
of the translation operator. Fortunately, it turns out that a much simpler representation for the Glauber 
state is possible. To show this, let us start with the following general (and very useful) property of 
exponential functions of an operator argument: if 


[4.8]= ul, (5.116) 

(where Aand B are arbitrary linear operators, and 4 is a c-number), then? 
exp|+ 4} Bexp|- A= B+ ul. (5.117) 
39 A proof of Eq. (117) may be readily achieved by expanding the operator f (A) = exp|+ aa} B exp|- aa} in 


the Taylor series with respect to the c-number parameter /, and then evaluating the result for 7 = 1. This simple 
exercise is left for the reader. 
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Let us apply Eqs. (116)-(117) to two cases, both with 


A 


A=ada — aa, so that exp{h 4}\= (ane expt 4} = ae (5.118) 
First, let us take B = /; then Eq. (116) is valid with = 0, and Eq. (117) yields 


717, =f, (5.119) 


This equality means that the translation operator is unitary — not a big surprise, because if we shift a 
classical point on the phase plane by a complex number (+a) and then by (-@), we certainly must come 
back to the initial position. Eq. (119) means merely that this fact is true for any quantum state as well. 


Second, let us take B = @; in order to find the corresponding parameter yz, we must calculate the 
commutator on the left-hand side of Eq. (116) for this case. Using, at the due stage of the calculation, 
Eq. (68), we get 


[4,8]-)a°a- aaa | --alat a = al, (5.120) 
so that in this case 44= a, and Eq. (117) yields 
7 147, =a+al. (5.121) 
We have approached the summit of this beautiful calculation. Let us consider the following operator: 


TTA, (5.122) 


A 


Using Eq. (119), we may reduce this product to a7, while the application of Eq. (121) to the same 


expression (122) yields 7a + al, . Hence, we get the following operator equality: 
a7, =7 G+av,, (5.123) 
which may be applied to any state. Now acting by both sides of this equality on the ground state’s ket 
(0), and using the fact that @|0) is the null-state, while according to Eq. (115), 7. |0) = |) , we finally 
get a very simple and elegant result:4° 
a\a)=ala). (5.124) 
Thus any Glauber state @ is one of the eigenstates of the annihilation operator, namely the one 


with the eigenvalue equal to the c-number parameter a of the state, i.e. to the complex representation 
(102) of the classical point which is the center of the Glauber state’s wavefunction.*! This fact makes the 


40 This result is also rather counter-intuitive. Indeed, according to Eq. (89), the annihilation operator @, acting 
upon a Fock state n, “beats it down” to the lower-energy state (n — 1). However, according to Eq. (124), the action 
of the same operator on a Glauber state @ does not lead to the state change and hence to any energy change! The 
resolution of this paradox is given by the representation of the Glauber state as a series of Fock states — see Eq. 
(134) below. The operator a indeed transfers each Fock component of this series to a lower-energy state, but it 
also re-weighs each term, so that the complete energy of the Glauber state remains constant. 

41 This fact means that the spectrum of eigenvalues a@ in Eq. (124), viewed as an eigenproblem, is continuous — it 
may be any complex number. 
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calculations of all Glauber state properties much simpler. As an example, let us calculate (x) in the 


Glauber state with some c-number a: 
‘ i! : :; f 
at+a' |a)=—“| (a 
(arat a) =*2( 


In the first term in the parentheses, we can apply Eq. (124) directly, while in the second term, we can 


a 


(=) = (ala) = 78 (a a)+(a|a"|q)). (5.125) 


use the bra-counterpart of that relation, (a|a" =(ala’. Now assuming that the Glauber state is 
normalized, (a|@) = 1, and using Eq. (102), we get 


Xo 


(x) = (@lala)+ (a|...0°|a))= “Ela +a")=x, (5.126) 


Acting absolutely similarly, we may verify that (p) = P, and that dx and dp do indeed obey Eqs. (99). 


As the last sanity check, let us use Eq. (124) to re-calculate the Glauber state’s wavefunction 
(107). Inner-multiplying both sides of that relation by the bra-vector (x|, and using the definition (65a) of 
the annihilation operator, we get 


P J)= asa) (5.127) 


2x, (ol ma, 


Since (x| is the bra-vector of the eigenstate of the Hermitian operator x , they may be swapped, with the 
operator giving its eigenvalue x; acting on that bra-vector by the (local!) operator of momentum, we 
have to use it in the coordinate representation — see Eq. (4.245). As a result, we get 


sae [ tla) 2 sla) |= ale). 6.128 


But (x|q@) is nothing else than the Glauber state’s wavefunction ’, so that Eq. (128) gives for it a first- 
order differential equation 


1 


V2x, 


Chasing ,, and x to the opposite sides of the equation, and using the definition (102) of the parameter 
a, we can bring this equation to the form (valid at fixed t, and hence fixed X and P): 


[s+ a oy, |-a¥, (5.129) 
MQ, Ox 


é a (5.130) 


Integrating both parts, we return to Eq. (107). 


Now we can use Eq. (124) for finding the coefficients @, in the expansion (105) of the Glauber 
state a in the series over the Fock states n. Plugging Eq. (105) into both sides of Eq. (124), using the 
second of Eqs. (89) on the left-hand side, and requiring the coefficients at each ket-vector |v) in both 
parts of the resulting relation to be equal, we get the following recurrence relation: 


a 


a, = —— a, 5.131 
n+l (n+1)'? n ( ) 


Chapter 5 Page 27 of 48 


Essential Graduate Physics QM: Quantum Mechanics 


Applying this relation sequentially for n = 0, 1, 2, etc., we get 


n 


a 
a, ~ Gye? 2" (5.132) 
Now we can find a from the normalization requirement (106), getting 
; ee Ja 2n 
la! > =, (5.133) 
n=0 n! 


In this sum, we may readily recognize the Taylor expansion of the function exp {| a|’}, so that the final 
result (besides an arbitrary common phase multiplier) is 


Glauber 
(5. 134) state vs 


Fock states 


Hence, if the oscillator is in the Glauber state @, the probabilities W,, = a@,a@,* of finding the 
system on the n' energy level (86) obey the well-known Poisson distribution (Fig. 9): 


Poisson 


n 
n 7 
W, Ay (n)_ (©.135)  getinsgen 


nn ' 
where (7) is the statistical average of n — see Eq. (1.37): 
(n) = onW,,. (5.136) 
n=0 


The result of such summation is not necessarily integer! In our particular case, Eqs. (134)-(136) yield 


(n)=|a)’. (5.137) 


Fig. 5.9. The Poisson distribution (135) 
for several values of (n). Note that W,, are 
defined only for integer values of n; the 
lines are only guides for the eye. 


Glauber state: 
= = =(n . (5.138) rms. 


uncertainty 


Another important property is that at (n)>> 1, the Poisson distribution approaches the Gaussian 
“normal’’) one, with a small relative r.m.s. uncertainty: dn/(n) << 1 — the trend clearly visible in Fig. 9. 
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Now let us discuss the Glauber state’s evolution in time. In the wave-mechanics language, it is 
completely described by the dynamics (100) of the c-number shifts X(t) and P(f) participating in the 
wavefunction (107). Note again that, in contrast to the spread of the wave packet of a free particle, 
discussed in Sec. 2.2, in the harmonic oscillator the Gaussian packet of the special width (99) does not 
spread at all! 


An alternative and equivalent way of dynamics description is to use the Heisenberg equation of 
motion. As Eqs. (29) and (35) tell us, such equations for the Heisenberg operators of coordinate and 
momentum have to be similar to the classical equations (100): 

A ~ Py A 


a ; Py =—Ma, ky. (5.139) 
m 


Now using Eqs. (66), for the Heisenberg-picture creation and annihilation operators we get the equations 
ay =-iOydy, al =+ia,é!, (5.140) 


which are completely similar to the classical equation (103) for the c-number parameter q@ and its 
complex conjugate, and hence have the solutions identical to Eq. (104): 

a,(t)=4,(0e °°, atay=at oye’, (5.141) 
As was discussed in Sec. 4.6, such equations are very convenient, because they enable simple 
calculation of time evolution of observables for any initial state of the oscillator (Fock, Glauber, or any 
other) using Eq. (4.191). In particular, Eq. (141) shows that regardless of the initial state, the oscillator 
always returns to it exactly with the period 27/@.* Applied to the Glauber state with a = 0, i.e. the 
ground state of the oscillator, such calculation confirms that the Gaussian wave packet of the special 
width (99) does not spread in time at all — even temporarily. 


Now let me briefly mention the states whose initial wave packets are still Gaussian, but have 
different widths, say dv < xo/V2. As we already know from Sec. 2.2, the momentum spread dp will be 
correspondingly larger, still with the smallest possible uncertainty product: dxdép = h/2. Such squeezed 
ground state ¢, with zero expectation values of x and p, may be generated from the Fock/Glauber ground 
state: 

IS)=F,|0), (5.142a) 


using the so-called squeezing operator, 


3, =exp|3("aa-cata")}, (5.142b) 


which depends on a complex c-number parameter ¢ = re’’, where r and @ are real. The parameter’s 


modulus r determines the squeezing degree; if ¢ is real (i.e. 0 = 0), then 


42 Actually, this fact is also evident from the Schrédinger picture of the oscillator’s time evolution: due to the 
exactly equal distances Aa between the eigenenergies (86), the time functions a,(¢) in the fundamental expansion 
(1.69) of its wavefunction oscillate with frequencies n@, and hence they all share the same time period 2 7/q@. 
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2 
MO Xo 


Xo o-r MM Xo or 
OX =—=e", op = 


e, so that dxdp = 
2 2 


On the phase plane (Fig. 8), this state, with r > 0, may be represented by an oval spot squeezed along 
one of two mutually perpendicular axes (hence the state’s name), and stretched by the same factor e” 
along the counterpart axis; the same formulas but with r < 0 describe squeezing along the other axis. On 
the other hand, the phase @ of the squeezing parameter ¢ determines the angle @ /2 of the 
squeezing/stretching axes about the phase plane origin — see the magenta ellipse in Fig. 8. If 94 0, Eqs. 
(143) are valid for the variables {x’, p’} obtained from {x, p} via clockwise rotation by that angle. For 
any of such origin-centered squeezed ground states, the time evolution is reduced to an increase of the 
angle with the rate @p, i.e. to the clockwise rotation of the ellipse, without its deformation, with the 
angular velocity @p — see the magenta arrows in Fig. 8. As a result, the uncertainties ox and dp oscillate 
in time with the double frequency 2@. Such squeezed ground states may be formed, for example, by a 
parametric excitation of the oscillator,*3 with a parameter modulation depth close to, but still below the 
threshold of the excitation of degenerate parametric oscillations. 


h 
=—. 5.143 
E (5.143) 


By action of an additional external force, the center of a squeezed state may be displaced from 
the origin to an arbitrary point {X, P}. Such displaced squeezed state may be described by the action of 
the translation operator (113) upon the ground squeezed state, i.e. by the action of the operator product 
7 Ss - on the usual (Fock / Glauber, i.e. non-squeezed) ground state. Calculations similar to those that 


led us from Eq. (114) to Eq. (124), show that such displaced squeezed state is an eigenstate of the 
following mixed operator: 


b=4coshr+a' e? sinhr, (5.144) 
with the same parameters r and @, with the eigenvalue 
B=acoshr+a ae sinhr, (5.145) 


thus generalizing Eq. (124), which corresponds to 7 = 0. For the particular case a = 0, Eq. (145) yields £ 
= 0, i.e. the action of the operator (144) on the squeezed ground state ¢ yields the null-state. Just as Eq. 
(124) in the case of the Glauber states, Eqs. (144)-(145) make the calculation of the basic properties of 
the squeezed states (for example, the proof of Eqs. (143) for the case ~@= 0 =0) very straightforward. 


Unfortunately, I do not have more time/space for a further discussion of the squeezed states in 
this section, but their importance for precise quantum measurements will be discussed in Sec. 10.2 
below.*4 


43 For a discussion and classical theory of this effect, see, e.g., CM Sec. 5.5. 

44 For more on the squeezed states see, e.g., Chapter 7 in the monograph by C. Gerry and P. Knight, /ntroductory 
Quantum Optics, Cambridge U. Press, 2005. Also, note the spectacular measurements of the Glauber and 
squeezed states of electromagnetic (optical) oscillators by G. Breitenbach et al., Nature 387, 471 (1997), a large 
(ten-fold) squeezing achieved in such oscillators by H. Vahlbruch et al., Phys. Rev. Lett. 100, 033602 (2008), and 
the first results on the ground state squeezing in micromechanical oscillators, with resonance frequencies @/27 as 
low as a few MHz, using their parametric coupling to microwave electromagnetic oscillators — see, e.g., E. 
Wollman et al., Science 349, 952 (2015) and/or J.-M. Pirkkalainen et al., Phys. Rev. Lett. 115, 243601 (2015). 
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5.6. Revisiting spherically-symmetric systems 


One more blank spot to fill has been left by our study, in Sec. 3.6, of wave mechanics of particle 
motion in spherically-symmetric 3D potentials. Indeed, while the azimuthal components of the 
eigenfunctions (the spherical harmonics) of such systems are very simple, 


v,, =(22)'7e™"?, with m=0,+1,+2...., (5.146) 


their polar components include the associated Legendre functions P/"(cos@), which may be expressed 
via elementary functions only indirectly — see Eqs. (3.165) and (3.168). This makes all the calculations 
less than transparent and, in particular, does not allow a clear insight into the origin of the very simple 
energy spectrum of such systems — see, e.g., Eq. (3.163). The bra-ket formalism, applied to the angular 
momentum operator, not only enables such insight and produces a very convenient tool for many 
calculations involving spherically-symmetric potentials, but also opens a clear way toward the 
unification of the orbital momentum with the particle’s spin — the latter task to be addressed in the next 
section. 


Let us start by using the correspondence principle to spell out the quantum-mechanical vector 
operator of the orbital angular momentum L = rxp of a point particle: 


(5.147) 


Civita permutation symbol, which we have already used in Sec. 4.5, and also in See. 1 of this chapter, in 
similar expressions (17)-(18). From this definition, we can readily calculate the commutation relations 


for all Cartesian components of operators L,f, and p ; for example, 


3 3 
li Lj, ?, |= be & inj" |= —al,.a» eae = - nA, ij" nin = IND PE yp = INP Wyn (5.148) 
k=l 


k=1 


The summary of all these calculations may be represented in similar compact forms: 


ie ?, |= iN?» reas [2.2 |= iM pe yy li,. | inL ne are (5.149) 


the last of them shows that the commutator of two different Cartesian components of the vector-operator 
Lis proportional to its complementary component. 


Also introducing, in a natural way, the (scalar!) —— of the observable L* =| L’, 


(5.150) 


it is straightforward to check that this operator commutes with each of the Cartesian components: 


[i.2,|=0. (5.151) 
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This result, at the first sight, may seem to contradict the last of Eqs. (149). Indeed, haven’t we learned in 
Sec. 4.5 that commuting operators (e.g., LP? and any of L ;) Share their eigenstate sets? If yes, shouldn’t 
this set has to be common for all four angular momentum operators? The resolution in this paradox may 
be found in the condition that was mentioned just after Eq. (4.138), but (sorry!) was not sufficiently 
emphasized there. According to that relation, if an operator has degenerate eigenstates (i.e. if some A; = 
A; even for j +’), they should not be necessarily all shared by another compatible operator. 

This is exactly the situation with the orbital angular momentum operators, which may be 
schematically shown on a Venn diagram (Fig. 10):4° the eigenstates of the operator L? are highly 
degenerate,*° and their set is broader than those of any component operator L , (that, as will be shown 


below, are non-degenerate — until we consider particle’s spin). 


Fig. 5.10. The Venn diagram showing the partitioning of 
the set of eigenstates of the operator L’ . Each inner sector 
corresponds to the states shared with one of the Cartesian 
component operators L,, while the outer (shaded) ring 


represents the eigenstates of L’ that are not shared with 
either of L, — for example, all linear combinations of the 


eigenstates of different component operators. 


Let us focus on just one of these three joint sets of eigenstates — by tradition, of the operators 1’ 
and L,. (This tradition stems from the canonical form of the spherical coordinates, in which the polar 
angle is measured from the z-axis. Indeed, in the coordinate representation we may write 


Ox 


Writing the standard eigenproblem for the operator in this representation, Ly, =L.y,,, we see that it 


1, =i», jp, = n2| i ine) ih eg i (5.152) 
: y 


is satisfied by the eigenfunctions (146), with eigenvalues L, = hm — which was already conjectured in 
Sec. 3.5.) More specifically, let us consider a set of eigenstates {/, m} corresponding to a certain 


degenerate eigenvalue of the operator, and all possible eigenvalues of the operator L., i.e. all 
possible quantum numbers m. (At this point, / is just some label of the eigenvalue of the operator eit 


will be defined more explicitly in a minute.) To analyze this set, it is instrumental to introduce the so- 
called ladder (also called, respectively, “raising” and “lowering”) operators*’ 


45 This is just a particular example of the Venn diagrams (introduced in the 1880s by John Venn) that show 
possible relations (such as intersections, unions, complements, etc.) between various sets of objects, and are very 
useful tool in the general set theory. 

46 Note that this particular result is consistent with the classical picture of the angular momentum vector: even 
when its length is fixed, the vector may be oriented in various directions, corresponding to different values of its 
Cartesian components. However, in the classical picture, all these components may be fixed simultaneously, while 
in the quantum picture this is not true. 

47 Note a substantial similarity between this definition and Eqs. (65) for the creation/annihilation operators. 
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Ladder 
operators 


(5.153) 


It is simple (and hence left for the reader’s exercise) to use this definition and the last of Eqs. (149) to 
calculate the following commutators: 


Important 
ae (5.154) 
and also to use Eqs. (149)-(150) to prove two other important operator relations: 
Pol +h -al, £=0 401-79. (5.155) 
Now let us rewrite the last of Eqs. (154) as 

LL, =L,L, thl,, (5.156) 

and act by its both sides upon the ket-vector |/, m) of an arbitrary common eigenstate: 
L_L,|1,m) = LL,|1,m)+hL,|1,m). (5.157) 


Since the eigenvalues of the operator L are equal to fm, in the first term of the right-hand side of Eq. 
(157) we may write 
L, 


I,m) =hm 


1,m). (5.158) 
With that, Eq. (157) may be recast as 


7.(é.\1,m))=n(m+1)(E,J/,m)). (5.159) 


In a spectacular similarity with Eqs. (78)-(79) for the harmonic oscillator, Eq. (159) means that 


the states L ald ,m) are also eigenstates of the operator Te corresponding to eigenvalues fi(m + 1). Thus 


the ladder operators act exactly as the creation and annihilation operators of a harmonic oscillator, 
moving the system up or down a ladder of eigenstates — see Fig. 11. 


eigenket eigenvalue of L, 

1,1) — +1 

Ben dhe. 

me 1, m) m+l1 

I ,m) m 
Lil : m) m-1 

Fig. 5.11. The ladder diagram of the common 
i 1) L, L. eigenstates of the operators P and es 
2 = i 


The most significant difference is that now the state ladder must end in both directions, because 
an infinite increase of |m|, with whichever sign of m, would cause the expectation values of the operator 
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P+ =2P-L, (5.160) 


which corresponds to a non-negative observable, to become negative. Hence there have to be two states 
at both ends of the ladder, with such ket-vectors |/, mmax) and |/, min) that 


i. 1, Max ) = 0, a 


Lite) =0, (5.161) 


Due to the symmetry of the whole problem with respect to the replacement m — —m, we should have 
Myin = — Mmax- This Max 18 exactly the quantum number traditionally called /, so that 


Evidently, this relation of quantum numbers m and / is semi-quantitatively compatible with the 
classical image of the angular momentum vector L, of the same length L, pointing in various directions, 
thus affecting the value of its component L,. In this classical picture, however, L* would be equal to the 
square of (Lz)max, i.e. to (Al); however, in quantum mechanics, this is not so. Indeed, applying both parts 
of the second of the operator equalities (155) to the top state’s vector |/, max) = |L 1), we get 


P11) =Ab,|L1)+ L]0,1) + L_L,|01) =) + A707 |0,1) +0 


(5.163) 


=h7l(l +1) 0,0). 


Since by our initial assumption, all eigenvectors |/, m) correspond to the same eigenvalue of L’, this 
result means that all these eigenvalues are equal to f°/(/ + 1). Just as in the case of the spin-'4 vector 
operators discussed in Sec. 4.5, the deviation of this result from f’/’ may be interpreted as the result of 
unavoidable uncertainties (“fluctuations”) of the x- and y-components of the angular momentum, which 
give non-zero positive contributions to (L,°) and ie and hence to (L’), even if the angular momentum 
vector is aligned with the z-axis in the best possible way. 


(For various applications of the ladder operators (153), one more relation is convenient: 


L,, 1,m) = nll(+1)—m(m+1)]"" 


l,m+l). (5.164) 


This equality, valid to the multiplier e’” with an arbitrary real phase g, may be readily proved from the 
above relations in the same way as the parallel Eqs. (89) for the harmonic-oscillator operators (65) were 
proved in Sec. 4; due to this similarity, the proof is also left for the reader’s exercise.*%) 


Now let us compare our results with those of Sec. 3.6. Using the expression of Cartesian 
coordinates via the spherical ones exactly as this was done in Eq. (152), we get the following 
expressions for the ladder operators (153) in the coordinate representation: 


48 The reader is also challenged to use the commutation relations discussed above to prove one more important 
property of the common eigenstates of L, and ia 


(mF, 


I',m')=0, — unless/"’=/+1 and m'= either m+1 orm. 


This property gives the selection rule for the orbital electric-dipole quantum transitions, to be discussed later in 
the course, especially in Sec. 9.3. (The final selection rules at these transitions may be affected by the particle’s 
spin — see the next section.) 
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ane le” steno |, (5.165) 
00 Op 
Now plugging this relation, together with Eq. (152), into any of Eqs. (155), we get 


(5.166) 


1 0 [ _ 0 ) 1 oO 
7 —| sin@— |+ == Ty 7 Ie 
sin@ 00 0d) sin° 009 
But this is exactly the operator (besides its division by the constant parameter 2sR?) that stands on the 


left-hand side of Eq. (3.156). Hence that equation, which was explored by the “brute-force” (wave- 
mechanical) approach in Sec. 3.6, may be understood as the eigenproblem for the operator L’ in the 


coordinate representation, with the eigenfunctions Y;"(6@,~) corresponding to the eigenkets |/, m), and the 
eigenvalues L? = 2R’E. As a reminder, the main result of that, rather involved analysis was expressed 
by Eq. (3.163), which now may be rewritten as 


Vip =2mR’E, = h7l(1 +1), (5.167) 


in full agreement with Eq. (163), which was obtained by much more efficient means based on the bra- 
ket formalism. In particular, it is fascinating to see how easy it is to operate with the eigenvectors |/, m), 
while the coordinate representations of these ket-vectors, the spherical harmonics Y/"(6,¢~), may be only 
expressed by rather complicated functions — please have one more look at Eq. (3.171) and Fig. 3.20. 


Note that all relations discussed in this section are not conditioned by any particular Hamiltonian 
of the system under analysis, though they (as well as those discussed in the next section) are especially 
important for particles moving in spherically-symmetric potentials. 


5.7. Spin and its addition to orbital angular momentum 


The theory described in the last section is useful for much more than orbital motion analysis. In 
particular, it helps to generalize the spin-’2 results discussed in Chapter 4 to other values of spin s — the 
parameter still to be quantitatively defined. For that, let us notice that the commutation relations (4.155) 
for spin-’2, which were derived from the Pauli matrix properties, may be rewritten in exactly the same 
form as Eqs. (149) and (151) for the orbital momentum: 


(5.168) 


It had been postulated (and then confirmed by numerous experiments) that these relations hold 
for quantum particles with any spin. Now notice that all the calculations of the last section have been 
based almost exclusively on such relations — the only exception will be discussed imminently. Hence, 
we may repeat them for the spin operators, and get the relations similar to Eqs. (158) and (163): 


na 


S, s,m,) = hm, s,m, ), S s,m,)=N's(s +1) s,m, ), O<s, a (5.169) 


where m; is a quantum number parallel to the orbital magnetic number m, and the non-negative constant 
s is defined as the maximum value of | m;|. The c-number s is exactly what is called the particle’s spin. 
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Now let us return to the only part of our orbital moment calculations that has not been derived 
from the commutation relations. This was the fact, based on the solution (146) of the orbital motion 
problems, that the quantum number m (the analog of m,) may be only an integer. For spin, we do not 
have such a solution, so that the spectrum of numbers m, (and hence its limits +s) should be found from 
the more loose requirement that the eigenstate ladder, extending from —s to + s, has an integer number of 
steps. Hence, 2s has to be an integer, i.e. the spin s of a quantum particle may be either integer (as it is, 
for example, for photons, gluons, and massive bosons W* and Z”), or half-integer (e.g., for all quarks 
and leptons, notably including electrons).*? For s = '4, this picture yields all the properties of the spin-'’2 
that were derived in Chapter 4 from Eqs. (4.115)-(4.117). In particular, the operators S? and S. have 
two common eigenstates (7 and J), with S, = fm, = +h/2, both with S’= s(s +1)h? = 3/4)’. 


Note that this analogy with the angular momentum sheds new light on the symmetry properties 
of spin-/%. Indeed, the fact that m in Eq. (146) is integer was derived in Sec. 3.5 from the requirement 
that making a full circle around axis z, we should find a similar value of wavefunction y,,, which differs 
from the initial one by an inconsequential factor exp{2zim} = +1. With the replacement m > m, = +’, 
such operation would multiply the wavefunction by exp {+i} =-—1, i.e. reverse its sign. Of course, spin 
properties cannot be described by a usual wavefunction, but this odd parity of electrons, shared by all 
other spin-’% particles, is clearly revealed in properties of multiparticle systems (see Chapter 8 below), 
and as a result, in their statistics (see, e.g., SM Chapter 2). 


Now we are sufficiently equipped to analyze the situations in which a particle has both the 
orbital momentum and the spin — as an electron in an atom. In classical mechanics, such an object, with 
the spin S interpreted as the angular moment of its internal rotation, would be characterized by the total 
angular momentum vector J = L + S. Following the correspondence principle, we may assume that 
quantum-mechanical properties of this observable may be described by the similarly defined vector 
operator: 


A 


J=L+S, (5.170) 
with Cartesian components 
J, =L,+8,, etc., (5.171) 
and the magnitude squared equal to 
PePrPss?. (5.172) 


Let us examine the properties of this vector operator. Since its two components (170) describe 
different degrees of freedom of the particle, i.e. belong to different Hilbert spaces, they have to be 
completely commuting: 


Z,.8,J=0, |@,.8]=0, |é,.s2]=0, [2.8 ]=o. (5.173) 


The above equalities are sufficient to derive the commutation relations for the operator J, and 
unsurprisingly, they turn out to be absolutely similar to those of its components: 


49 As a reminder, in the Standard Model of particle physics, such hadrons as mesons and baryons (notably 
including protons and neutrons) are essentially composite particles. However, at non-relativistic energies, protons 
and neutrons may be considered fundamental particles with s = 4. 
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(5.174) 


Now repeating all the arguments of the last section, we may derive the following expressions for the 
common eigenstates of the operators J? and J ee 


J|j,m;)=hm,|j,m,), J?]j,m,)=WjG+D]jm,), OS j, —j<m,<tj,  |(6.175) 


where j and m; are new quantum numbers.*? Repeating the arguments just made for s and ms, we may 
conclude that j and m; may be either integer or half-integer. 


Before we proceed, one remark on notation: it is very convenient to use the same letter m for 
numbering eigenstates of all momentum components participating in Eq. (171), with corresponding 
indices (j, /, and s), in particular, to replace what we called m with m;. With this replacement, the main 
results of the last section may be summarized in a form similar to Eqs. (168), (169), (174), and (175): 


L; E, |=inb typ, lé.2,|=0, (5.176) 


L,|1,m,)=hm,|l,m,), L’\l,m,)=h71+D\l,m,), O<1, -1<m, <4. (5.177) 


In order to understand which eigenstates participating in Eqs. (169), (175), and (177) are 
compatible with each other, it is straightforward to use Eq. (172), together with Eqs. (168), (173), (174), 
and (176) to get the following relations: 


[72,2]=0, [32,82]<0, (5.178) 


|7.é.]40, [7,8.]¢0. (5.179) 


This result is represented schematically on the Venn diagram shown in Fig. 12, in which the crossed 
arrows indicate the only non-commuting pairs of operators. 


operators 
diagonal in 
the coupled 


representation 
operators 


diagonal in 
the uncoupled Fig. 5.12. The Venn diagram of angular momentum 
representation operators, and their mutually-commuting groups. 


This means that there are eigenstates shared by two operator groups encircled with colored lines 
in Fig. 12. The first group (encircled red), consists of all these operators but J”. Hence there are 
eigenstates shared by the five remaining operators, and these states correspond to definite values of the 
corresponding quantum numbers: /, m), s, ms, and m;. Actually, only four of these numbers are 


50 Let me hope that the difference between the quantum number /, and the indices /, 7’, 7” numbering the Cartesian 
components in the relations like Eqs. (168) or (174), is absolutely clear from the context. 
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independent, because due to Eq. (171) for these compatible operators, for each eigenstate of this group, 
their “magnetic” quantum numbers m have to satisfy the following relation: 


m, =m,+m,. (5.180) 


Hence the common eigenstates of the operators of this group are fully defined by just four quantum 
numbers, for example, /, m), s, and m,;. For some calculations, especially those for the systems whose 
Hamiltonians include only the operators of this group, it is convenient*! to use this set of eigenstates as 
the basis; frequently this approach is called the uncoupled representation. 


However, in some situations we cannot ignore interactions between the orbital and spin degrees 
of freedom (in the common jargon, the spin-orbit coupling), which leads in particular to splitting (called 
the fine structure) of the atomic energy levels even in the absence of external magnetic field. I will 
discuss these effects in detail in the next chapter, and now will only note that they may be described by a 
term proportional to the product L-S, in the system’s Hamiltonian. If this term is substantial, the 
uncoupled representation becomes inconvenient. Indeed, writing 


J? =(L+8) =2 +8? +2L-S8, — sothat 2L-S=/?-1? -S’, (5.181) 


and looking at Fig. 12 again, we see that operator L-S, describing the spin-orbit coupling, does not 
commute with operators 7: and cS This means that stationary states of the system with such term in 
the Hamiltonian do not belong to the uncoupled representation’s basis. On the other hand, Eq. (181) 


shows that the operator L-S does commute with all four operators of another group, encircled blue in 
Fig. 12. According to Eqs. (178), (179), and (181), all operators of that group also commute with each 
other, so that they have common eigenstates, described by the quantum numbers, /, s, 7, and mj;. This 
group 1s the basis for the so-called coupled representation of particle states. 


Excluding, for the notation briefness, the quantum numbers / and s, common for both groups, it 
is convenient to denote the common ket-vectors of each group as, respectively, 


|m 1M, )s for the uncolpled representation's basis, 


(5.182) 
jm, ), for the coupled representation's basis. 


As we will see in the next chapter, for the solution of some important problems (e.g., the fine structure 
of atomic spectra and the Zeeman effect), we will need the relation between the kets |7, m,) and the kets 
|m;, ms). This relation may be represented as the usual linear superposition, 


jm,)= > |m,.m, )(m, .m, j,m,). (5.183) 


mm 


Ss 


The short brackets in this relation, essentially the elements of the unitary matrix of the transformation 
between two eigenstate bases (182), are called the Clebsch-Gordan coefficients. 


The best (though imperfect) classical interpretation of Eq. (183) I can offer is as follows. If the 
lengths of the vectors L and S (in quantum mechanics associated with the numbers / and s, respectively), 


5! This is especially true for motion in spherically-symmetric potentials, whose stationary states correspond to 
definite / and m;; however, the relations discussed in this section are important for some other problems as well. 
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and also their scalar product L-S, are all fixed, then so is the length of the vector J = L + S — whose 
length in quantum mechanics is described by the number 7. Hence, the classical image of a specific 
eigenket |j, m;), in which /, s, 7, and m; are all fixed, is a state in which Le S°, J, and J; are fixed. 
However, this fixation still allows for arbitrary rotation of the pair of vectors L and S (with a fixed angle 
between them, and hence fixed L-S and J”) about the direction of the vector J — see Fig. 13. 


Hence the components L, and S, in these conditions are not fixed, and in classical mechanics 
may take a continuum of values, two of which (with the largest and the smallest possible values of S-) 
are shown in Fig. 13. In quantum mechanics, these components are quantized, with their states 
represented by eigenkets |), m,), so that a linear combination of such kets is necessary to represent a ket 
\7, mj). This is exactly what Eq. (183) does. 


Fig. 5.13. A classical image of two 
different quantum states with the 
same quantum numbers /, s, 7, and 
m,, but different m; and m,. 


Some properties of the Clebsch-Gordan coefficients (7m), ms| 7, mj) may be readily established. 
For example, the coefficients do not vanish only if the involved magnetic quantum numbers satisfy Eq. 
(180). In our current case, this relation is not an elementary corollary of Eq. (171), because in the 
Clebsch-Gordan coefficients, with the quantum numbers m), m,; in one state vector, and m; in the other 
state vector, characterize the relation between different groups of the basis states, so we need to prove 
this fact. All matrix elements of the null-operator 


J,-(£,+8,)=0 (5.184) 
should equal zero in any basis; in particular 
(j,.m,|J, -(L, + 8.)|m,,m,) =0. (5.185) 
Acting by the operator J , upon the bra-vector, and by the sum en + S.) upon the ket-vector, we get 
lm, —(m, + m,)](7,m, |m,,m,) =0, (5.186) 
thus proving that 
(m,,m, j,m,) =(j,m, m,,m,) =0, if m, #m,+m,. (5.187) 


For the most important case of spin-4 particles (with s = ‘2, and hence m, = +/4), whose 
uncoupled representation basis includes 2x(2/ + 1) states, the restriction (187) enables the representation 
of all non-zero Clebsch-Gordan coefficients on the simple “rectangular” diagram shown in Fig. 14. 
Indeed, each coupled-representation eigenket |/, m;), with m; = mj; + ms = m; = '2, may be related by non- 
zero Clebsch-Gordan coefficients to at most two uncoupled-representation eigenstates |m), ms). Since m; 
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may only take integer values from —/ to +/, m; may only take semi-integer values on the interval [- /— '4, 
1+ 4]. Hence, by the definition of j as (7;)max, its maximum value has to be /+ 2, and for mj =/1+ '4, 
this is the only possible value with this 7. This means that the uncoupled state with m; = / and m, = 2 
should be identical to the coupled-representation state with 7 =/+ 2 and m;=1+ '”: 


jalt+%sm, =1+%)=|m, =m,—-%m, =+%). (5.188) 


In Fig. 14, these two identical states are represented with the top-rightmost point (the uncoupled 
representation) and the sloped line passing through it (the coupled representation). 


Fig. 5.14. A graphical representation of possible basis states of a spin-’2 particle with a fixed /. Each dot 
corresponds to an uncoupled-representation ket-vector |m,, m;), while each sloped line corresponds to one 
coupled-representation ket-vector |j, m;), related by Eq. (183) to the kets |m,, m;) whose dots it connects. 


However, already the next value of this quantum number, m; = / — '2, is compatible with two 
values of j, so that each |), ms) ket has to be related to two |j, mj) kets by two Clebsch-Gordan 
coefficients. Since j changes in unit steps, these values of 7 have to be / + '4. This choice, 


J=lty, (5.189) 


evidently satisfies all lower values of m; as well — see Fig. 14.52 (Again, only one value, j = / + 4, is 
necessary to represent the state with the lowest m; = —/ — 2 — see the bottom-leftmost point of that 
diagram.) Note that the total number of the coupled-representation states is 1 + 2x2/ + 1 = 2(2/ + 1), Le. 
is the same as those in the uncoupled representation. So, for spin-’2 systems, each sum (183), for fixed j 
and m; (plus the fixed common parameter /, plus the common s = 4), has at most two terms, i.e. involves 
at most two Clebsch-Gordan coefficients. 


These coefficients may be calculated in a few steps, all but the last one rather simple even for an 
arbitrary spin s. First, the similarity of the vector operators J andS to the operator L, expressed by 
Eqs. (169), (175), and (177), may be used to argue that the matrix elements of the operators S. and J Ae 
defined similarly to im , have the matrix elements similar to those given by Eq. (164). Next, acting by 
the operator J .= im + S upon both parts of Eq. (183), and then inner-multiplying the result by the bra 


52 Eq. (5.189) allows a semi-qualitative classical interpretation in terms of the vector diagrams shown in Fig. 13: 
since, according to Eq. (169), fs gives the scale of the length of the vector S, if it is small (s = %), the length of 
vector J (similarly scaled by fj) cannot deviate much from the length of the vector L (scaled by fl) for any spatial 
orientation of these vectors, so that 7 cannot differ from / too much. Note also that for a fixed m,, the alternating 
sign in Eq. (189) is independent of the sign of m,— see also Eqs. (190). 
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vector (m), ms| and using the above matrix elements, we may get recurrence relations for the Clebsch- 
Gordan coefficients with adjacent values of m, ms, and mj. Finally, these relations may be sequentially 
applied to the adjacent states in both representations, starting from any of the two states common for 
them — for example, from the state with the ket-vector (188), corresponding to the top right point in Fig. 
14. 


Let me leave these straightforward but a bit tedious calculations for the reader’s exercise, and 
just cite the final result of this procedure for s = 2:52 


; pila ; lt+m,+% as 
(mm, =m, — nm, = +A] j = 14 %,m,)= +] — , 


(5.190) 


21+1 


= 1/2 
a 


(m, =m, +¥%,m, -—A)j=term,) =f 


In this course, these relations will be used mostly in Sec. 6.4 for an analysis of the anomalous Zeeman 
effect. Moreover, the angular momentum addition theory described above is also valid for the addition 
of angular momenta of multiparticle system components, so we will revisit it in Chapter 8. 


To conclude this section, I have to note that the Clebsch-Gordan coefficients (for arbitrary s) 
participate also in the so-called Wigner-Eckart theorem that expresses the matrix elements of spherical 
tensor operators, in the coupled-representation basis |j, m;), via a reduced set of matrix elements. This 
theorem may be useful, for example, for the calculation of the rate of quantum transitions to/from high-n 
states in spherically-symmetric potentials. Unfortunately, a discussion of this theorem and its 
applications would require a higher mathematical background than I can expect from my readers, and 
more time/space than I can afford.*4 


5.8. Exercise problems 


5.1. Use the discussion in Sec. | to find an alternative solution of Problem 4.18. 


5.2. A spin-’2 is placed into an external magnetic field, with a time-independent orientation, its 
magnitude A(t) being an arbitrary function of time. Find explicit expressions for the Heisenberg 
operators and the expectation values of all three Cartesian components of the spin, as functions of time, 
in a coordinate system of your choice. 


5.3. A two-level system is in the quantum state a@ described by the ket-vector |a) = ar|t) + 
a\v), with given (generally, complex) c-number coefficients at. Prove that we can always select such 
a geometric c-number vector ¢ = {c,, Cc), c-}, that @ is an eigenstate of ¢-6, where 6 is the Pauli vector 
operator. Find all possible values of ¢ satisfying this condition, and the second eigenstate (orthogonal to 
a) of the operator e-6. Give a Bloch-sphere interpretation of your result. 


53 For arbitrary spin s, the calculations and even the final expressions for the Clebsch-Gordan coefficients are 
rather bulky. They may be found, typically in a table form, mostly in special monographs — see, e.g., A. 
Edmonds, Angular Momentum in Quantum Mechanics, Princeton U. Press, 1957. 

54 For the interested reader, I can recommend either Sec. 17.7 in E. Merzbacher, Quantum Mechanics, 374 ed., 
Wiley, 1998, or Sec. 3.10 in J. Sakurai, Modern Quantum Mechanics, Addison-Wesley, 1994. 


Chapter 5 Page 41 of 48 


Essential Graduate Physics QM: Quantum Mechanics 


5.4. Analyze statistics of the spacing S = E. — E. between energy levels of a two-level system, 
assuming that all elements Hj’ of its Hamiltonian matrix (2) are independent random numbers, with 
equal and constant probability densities within the energy interval of interest. Compare the result with 
that for a purely diagonal Hamiltonian matrix, with the similar probability distribution of its random 
diagonal elements. 


5.5. For a periodic motion of a single particle in a confining potential U(r), the virial theorem of 
non-relativistic classical mechanics>> is reduced to the following equality: 


fFultv, 
2 


where T is the particle’s kinetic energy, and the top bar means averaging over the time period of motion. 
Prove the following quantum-mechanical version of the theorem for an arbitrary stationary quantum 
state, in the absence of spin effects: 


(r)=>("-VU), 
where the angular brackets mean the expectation values of the observables. 


Hint. Mimicking the proof of the classical virial theorem, consider the time evolution of the 
following operator: 


5.6. Calculate, in the WKB approximation, the transparency Y of the following saddle-shaped 
potential barrier: 


U(x,y) = uf : =) 
a 
where Up > 0 and a are real constants, for tunneling of a 2D particle with energy E < Up. 


5.7. Calculate the so-called Gamow factor>® for the alpha decay of atomic nuclei, i.e. the 
exponential factor in the transparency of the potential barrier resulting from the following simple model 
for the alpha-particle’s potential energy as a function of its distance from the nuclear center: 


U, <0, for r<R, 
U(r) =< ZZ'e’ 
Aner 


; for R<r, 


(where Ze = 2e > 0 is the charge of the particle, Z’e > 0 is that of the nucleus after the decay, and R is 
the nucleus’ radius), in the WKB approximation. 


5.8. Use the WKB approximation to calculate the average time of ionization of a hydrogen atom, 
initially in its ground state, made metastable by application of an additional weak, uniform, time- 
independent electric field & Formulate the conditions of validity of your result. 


55 See, e.g., CM Problem 1.12. 
56 Named after G. Gamow, who made this calculation as early as in 1928. 
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5.9. For a 1D harmonic oscillator with mass m and frequency @p, calculate: 
n') , and 


(ii) the diagonal matrix elements (n | £*|n) ‘ 


(1) all matrix elements (n | x 


where 7 and n’ are arbitrary Fock states. 


5.10. Calculate the sum (over all 1 > 0) of the so-called oscillator strengths, 


5.10 ; 


2 


> 


_ 2m 


12M, — Bo 


(i) for a 1D harmonic oscillator, and 
(11) for a 1D particle confined in an arbitrary stationary potential. 


B 


5.11. Prove the so-called Bethe sum rule, 


_ ine) |? _ Wk? 
dE E,)Knle n') = 


> 


valid for a 1D particle moving in an arbitrary time-independent potential U(x), and discuss its relation 
with the Thomas-Reiche-Kuhn sum rule whose derivation was the subject of the previous problem. 


Hint: Calculate the expectation value, in a stationary state n, of the following double 
commutator, 


D= la ells | ents | 
in two ways: first, just spelling out both commutators, and, second, using the commutation relations 


between operators p, and em and compare the results. 


5.12. Given Eq. (116), prove Eq. (117), using the hint given in the accompanying footnote. 


5.13. Use Eqs. (116)-(117) to simplify the following operators: 
(1) exp{+ ia} P. exp{- iax}, and 
(ii) exp{+ iap, } & exp{-iaf, }. 
where a is a c-number. 
.14. For a 1D harmonic oscillator, calculate: 


(i) the expectation value of energy, and 
(ii) the time evolution of the expectation values of the coordinate and momentum, 


provided that in the initial moment (¢ = 0) it was in the state described by the following ket-vector: 
1 


ja) = (1) +[32), 


where |7) are the ket-vectors of the stationary (Fock) states of the oscillator. 
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5.15.' Re-derive the London dispersion force’s potential of the interaction of two isotropic 3D 


harmonic oscillators (already calculated in Problem 3.16), using the language of mutually-induced 
polarization. 


5.16. An external force pulse F(t), of a finite time duration 7, has been exerted on a 1D harmonic 


oscillator, initially in its ground state. Use the Heisenberg-picture equations of motion to calculate the 
expectation value of the oscillator’s energy at the end of the pulse. 


5.17. Use Eqs. (144)-(145) to calculate the uncertainties dx and dp for a harmonic oscillator in its 
squeezed ground state, and in particular, to prove Eqs. (143) for the case 0= 0. 


5.18. Calculate the energy of a harmonic oscillator in the squeezed ground state ¢ 


5.19." Prove that the squeezed ground state, described by Eqs. (142) and (144)-(145), may be 
sustained by a sinusoidal modulation of a harmonic oscillator’s parameter, and calculate the squeezing 
factor r as a function of the parameter modulation depth, assuming that the depth is small, and the 
oscillator’s damping is negligible. 


5.20. Use Eqs. (148) to prove that the operators L ; and L? commute with the Hamiltonian of a 


spinless particle placed in any central potential field. 


5.21. Use Eqs. (149)-(150) and (153) to prove Eqs. (155). 


.22. Derive Eq. (164), using any of the prior formulas. 


5.23. In the basis of common eigenstates of the operators bE and L?, described by kets |/, m): 


l, m,); 
(ii) spell out your results for diagonal matrix elements (with m; = mz) and their y-axis 
counterparts; and 
I,m). 


5.24. For the state described by the common eigenket | /, m) of the operators de and L’ ina 


72 
L. 


(i) calculate the matrix elements (i 5M, | Le 


I,m, and (i,m, 


I,m) and (i,m ££, 


(111) calculate the diagonal matrix elements (i .m| LT. 


reference frame {x, y, z}, calculate the expectation values (Z,;) and tise) in the reference frame whose 
z’-axis forms angle @ with the z-axis. 


5.25. Write down the matrices of the following angular momentum operators: 


L JL ,L,,and oe , in the z-basis of the {/, m} states with / = 1. 


x y 


5.26. Calculate the angular factor of the orbital wavefunction of a particle with a definite value 
of L’, equal to 6h’, and the largest possible value of L,. What is this value? 
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r 


5.27. For the state with the wavefunction y= Cxye”’, with a real, positive A, calculate: 


(1) the expectation values of the observables L,, L,, L., and i bags and 
(ii) the normalization constant C. 


5.28. An angular state of a spinless particle is described by the following ket-vector: 


| =3,m =0)+|/=3,m=1)). 


1 
a)=—= 
Ja) =o 
Calculate the expectation values of the x- and y-components of its angular momentum. Is the result 
sensitive to a possible phase shift between the component eigenkets? 


5.29. A particle is in a quantum state @ with the orbital wavefunction proportional to the 
spherical harmonic Y,'(0,g). Find the angular dependence of the wavefunctions corresponding to the 
following ket-vectors: 


(i) Le), Gi) L,je), id) L.Ja), (iv) L,L_]a), and (v) Pla). 


5.30. A charged, spinless 2D particle of mass m is trapped in a soft potential well U(x, y) = 
m@y (x +y’)/2. Calculate its energy spectrum in the presence of a uniform magnetic field Z, normal to 
the [x, y]-plane of particle’s motion. 


5.31. Solve the previous problem for a spinless 3D particle, placed (in addition to a uniform 
magnetic field B) into a spherically-symmetric potential well U(r) = »a@y1"/2. 


5.32. Calculate the spectrum of rotational energies of an axially-symmetric, rigid body. 


5.33. Simplify the following double commutator: 
Calas 
5.34. Prove the following commutation relation: 
2. |z.#,|]- 2n (7,2? +24), 


5.35. Use the commutation relation proved in the previous problem, and Eq. (148), to prove the 
orbital electric-dipole selection rules mentioned in Sec. 5.6 of the lecture notes. 


Nn 


.36. Express the commutators listed in Eq. (179), [3,2, | and [7,81 via 1 and Sie 


Nn 


fe) 


using the similarity of this operation with the shift of a Cartesian coordinate, discussed in Sec. 5. Then 
use this operator to calculate the probabilities of measurements of spin-’2 components of a beam of 


. Find the operator t, describing a quantum state’s rotation by angle ¢ about a certain axis, 
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particles with z-polarized spin, by a Stern-Gerlach instrument turned by angle @ within the [z, x] plane, 
where y is the axis of particle propagation — see Fig. 4.1.57 


5.38. The rotation (“angular translation”) operator c analyzed in the previous problem, and the 


linear translation operator C discussed in Sec. 5, have a similar structure: 

7, =exptiCa/n}, 
where 4 is a real c-number, characterizing the shift, and C is a Hermitian operator, which does not 
explicitly depend on time. 


(1) Prove that such operators 7, are unitary. 


(11) Prove that if the shift by 2, induced by the operator ie , leaves the Hamiltonian of some 
system unchanged for any /, then (C) is a constant of motion for any initial state of the system. 
(111) Discuss what the last conclusion means for the particular operators C and hes 


5.39. A particle with spin s is in a state with definite quantum numbers / and 7. Prove that the 
observable L-S also has a definite value, and calculate it. 


5.40. For a spin-% particle in a state with definite quantum numbers /, m;, and ms, calculate the 
expectation value of the observable J’, and the probabilities of all its possible values. Interpret your 
results in the terms of the Clebsh-Gordan coefficients (190). 


5.41. Derive general recurrence relations for the Clebsh-Gordan coefficients. 


Hint: Using the similarity of the commutation relations discussed in Sec. 7, write the relations 
similar to Eqs. (164) for other components of the angular momentum, and apply them to Eq. (170). 


5.42. Use the recurrence relations derived in the previous problem to prove Eqs. (190) for the 
spin-’2 Clebsh-Gordan coefficients. 


5.43. A spin-% particle is in a state with definite values of L, J’, and J,. Find all possible values 
of the observables S’, S., and L., the probability of each listed value, and the expectation value for each 
of these observables. 


5.44. Re-solve the Landau-level problem discussed in Sec. 3.2, for a spin-’ particle. Discuss the 
result for the particular case of an electron, with the g-factor equal to 2. 


5.45. In the Heisenberg picture of quantum dynamics, find an explicit relation between the 
operators of velocity v =dr/dt and acceleration a = dv/dt of a spin-/ particle with electric charge g, 
moving in an arbitrary external electromagnetic field. Compare the result with the corresponding 
classical expression. 


Hint: For the orbital motion’s description, you may use Eq. (3.26). 


57 Note that the last task is just a particular case of Problem 4.18 (see also Problem 1). 
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5.46. A byproduct of the solution of Problem 41 is the following relation for the spin operators 
(valid for any spins): 
(m, + 1|S,,|m,) = nl(s tm, + 1\(s +m, yf? , 


Use this result to spell out the matrices S,, S,, S., and S’ of a particle with s = 1, in the z-basis — defined 
as the basis in which the matrix S, is diagonal. 


5.47. For a particle with an arbitrary spin s, moving in a spherically-symmetric field, find the 
ranges of the quantum numbers m; and / that are necessary to describe, in the coupled-representation 
basis: 

(i) all states with a definite quantum number /, and 

(ii) a state with definite values of not only /, but also m; and ms. 


Give an interpretation of your results in terms of the classical geometric vector diagram (Fig. 13). 
5.48. A particle of mass », with electric charge g and spin s, free to move along a plane ring of a 


radius R, is placed into a constant, uniform magnetic field &, directed normally to the ring’s plane. 


Calculate the energy spectrum of the system. Explore and interpret the particular form the result takes 
when the particle is an electron with the g-factor g. = 2. 
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Chapter 6. Perturbative Approaches 


This chapter discusses several perturbative approaches to problems of quantum mechanics, and their 
simplest but important applications including the fine structure of atomic energy levels, and the effects 
of external dc and ac electric and magnetic fields on these levels. It continues with a discussion of the 
perturbation theory of transitions to continuous spectrum and the Golden Rule of quantum mechanics, 
which naturally brings us to the issue of open quantum systems — to be discussed in the next chapter. 


6.1. Time-independent perturbations 


Unfortunately, only a few problems of quantum mechanics may be solved exactly in an 
analytical form. Actually, in the previous chapters we have solved a substantial part of such problems 
for a single particle, while for multiparticle systems, the exactly solvable cases are even more rare. 
However, most practical problems of physics feature a certain small parameter, and this smallness may 
be exploited by various approximate analytical methods giving asymptotically correct results — 1.e. the 
results whose error tends to zero at the reduction of the small parameter(s). Earlier in the course, we 
have explored one of them, the WKB approximation, which is adequate for a particle moving through a 
soft potential profile. In this chapter, we will discuss other techniques that are more suitable for other 
cases. The historic name for these techniques is the perturbation theory, though it is more fair to speak 
about several perturbative approaches, because they are substantially different for different situations. 


The simplest version of the perturbation theory addresses the problem of stationary states and 
energy levels of systems described by time-independent Hamiltonians of the type 


H=H+H™, (6.1) 


where the operator H | describing the system’s “perturbation”, is relatively small — in the sense that its 


addition to the unperturbed operator H results in a relatively small change of the eigenenergies E,, of 
the system, and the corresponding eigenstates. A typical problem of this type is the 1D weakly 
anharmonic oscillator (Fig. 1), described by the Hamiltonian (1) with 


Weakly 
anharmonic 
oscillator (6.2) 
U(x) 
crn mex’ 1 4 HO 
2 
E a Bee Fig. 6.1. The simplest application of 


the perturbation theory: a weakly 
anharmonic 1D oscillator. (Dashed 
lines characterize the unperturbed, 
harmonic oscillator.) 
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I will use this system as our first example, but let me start by describing the perturbative 
approach to the general time-independent Hamiltonian (1). In the bra-ket formalism, the eigenproblem 
(4.68) for the perturbed Hamiltonian, i.e. the stationary Schrédinger equation of the system, is 


(4 +4 in) =E,|n). (6.3) 
Let the eigenstates and eigenvalues of the unperturbed Hamiltonian, which satisfy the equation 
BOR) _ En) (6.4) 


be considered as known. In this case, the solution of problem (3) means finding, first, its perturbed 
eigenvalues E, and, second, the coefficients (n’n) of the expansion of the perturbed state’s vectors |) 
in the following series over the unperturbed ones, |n’: 


i) = yn" \(n \n). (6.5) 


Let us plug Eq. (5), with the summation index n’ replaced with n” (just to have a more compact 
notation in our forthcoming result), into both sides of Eq. (3): 


Py ae |n) 1 |n") + Le jn) Hn") _ ya n)E, 
n" n" n" 


and then inner-multiply all terms by an arbitrary unperturbed bra-vector (n’ | of the system. Assuming 
that the unperturbed eigenstates are orthonormal, (n’ In” ) = 6, n”, and using Eq. (4) in the first term 
on the left-hand side, we get the following system of linear equations 


5 (on). = ("ne 2, en 


where the matrix elements of the perturbation are calculated, by definition, in the unperturbed brackets: 


nt") (6.6) 


Perturbation’s 


H® =(n' eae o (6.8) matrix 


elements 


n'n" 


The linear equation system (7) is still exact,! and is frequently used for numerical calculations. 
(Since the matrix coefficients (8) typically decrease when n’ and/or n” become sufficiently large, the 
sum on the left-hand side of Eq. (7) may usually be truncated, still giving an acceptable accuracy of the 
solution.) To get analytical results, we need to make approximations. In the simple perturbation theory 
we are discussing now, this is achieved by the expansion of both the eigenenergies and the expansion 
coefficients into the Taylor series in a certain small parameter y of the problem: 


BE, = EO +E +E .., (6.9) 


ae |”) = (n' \n) + (n' ln) + ie In), (6.10) 


where 


! Please note the similarity of Eq. (7) with Eq. (2.215) of the 1D band theory. Indeed, the latter equation is not 
much more than a particular form of Eq. (7) for the 1D wave mechanics, and a specific (periodic) potential U(x) 
considered as the perturbation Hamiltonian. Moreover, the whole approximate treatment of the weak-potential 
limit in Sec. 2.7 is essentially a particular case of the perturbation theory we are discussing now (in its 1“ order). 
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B® ac (nny ox a. (6.11) 


In order to explore the 1“-order approximation, which ignores all terms O(/) and higher, let us 
plug only the two first terms of the expansions (9) and (10) into the basic equation (7): 


LHe oi, +(n" n)"") : (5, +(n" In)” er + E® — B®), (6.12) 


n" 


Now let us open the parentheses, and disregard all the remaining terms O(//). The result is 


Hp, = Syy En +(00|n) (Ee? EQ), (6.13) 


n'n n'n~n 


This relation is valid for any set of indices n and n’; let us start from the case n = n’, 
immediately getting a very simple (and practically, the most important!) result: 


(6.14) 


For example, let us see what this result gives for two first perturbation terms in the weakly anharmonic 
oscillator (2): 


£9 = a(n |§5)n) + B(n [24 


nO), (6.15) 


As the reader knows (or should know :-) from the solution of Problem 5.9, the first bracket equals zero, 
while the second one yields? 

Eo == Bxi(on* +2n+1), (6.16) 
Naturally, there should be some non-vanishing contribution to the energies from the (typically, larger) 


perturbation proportional to a, so that for its calculation we need to explore the 2" order of the theory. 
However, before doing that, let us complete our discussion of its 1“ order. 


For n’ #n, Eq. (13) may be used to calculate the eigenstates rather than the eigenvalues: 


(1) 
wy) AO. 4a ; 
(n I) = EO Eo EO : forn' #n. (6.17) 


This means that the eigenket’s expansion (5), in the 1“ order, may be represented as 


M\_ (0) Hy 1(0) 
In ) C\n + Vaan ) (6.18) 


(0) (0) 
fen Tee 


The coefficient C = (n\n) cannot be found from Eq. (17); however, requiring the final state n to be 
normalized, we see that other terms may provide only corrections O(s), so that in the 1‘ order we 
should take C = 1. The most important feature of Eq. (18) is its denominators: the closer are the 
unperturbed eigenenergies of two states, the larger is their mutual “interaction” due to the perturbation. 


2 A useful exercise for the reader: analyze the relation between Eq. (16) and the result of the classical theory of 
such weakly anharmonic (”nonlinear’’) oscillator — see, e.g., CM Sec. 5.2, in particular, Eq. (5.49). 
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This feature also affects the 1*-order’s validity condition, which may be quantified using Eq. 
(17): the magnitudes of the brackets it describes have to be much less than the unperturbed bracket 
(n\ny = 1, so that all elements of the perturbation matrix have to be much less than the difference 
between the corresponding unperturbed energies. For the anharmonic oscillator’s energy corrections 


(16), this requirement is reduced to E," << hap. 


Now we are ready for going after the 2"'-order approximation to Eq. (7). Let us focus on the case 
n’ =n, because as we already know, only this term will give us a correction to the eigenenergies. 
Moreover, since the left-hand side of Eq. (7) already has a small factor H”,,» oc wu, the bracket 
coefficients in that part may be taken from the 1*-order result (17). As a result, we get 


@M py 
(1) Hoe 
Q) _ (0) gd _ n'n** nn 
E, _ dn \n) Ann a DFO EO * (6.19) 
n° n"tn/y ~ F4yn 
Since H ” has to be Hermitian, we may rewrite this expression as 


lar 2 (in An 


' Energy: 

(2) _ it a > nd 
Ey = DFG Eo Pe ; (6.20) — 2™-order 
neny — yn n'#n nn! correction 


This is the much-celebrated 2"'-order perturbation result, which frequently (in sufficiently 
symmetric problems) is the first non-vanishing correction to the state energy — for example, from the 
cubic term (proportional to @) in our weakly anharmonic oscillator problem (2). To calculate the 
corresponding correction, we may use another result of the solution of Problem 5.9: 


evo (& 
af (6.21) 
x {[ncn —1(n—2)]°6,,,,5 +3075, , +304 1)°76,,,, + [(n t+ Dnt 2)(0 +3)" 6,3 \ 


n',n-| n',nt+l 


a3 
xX 


So, according to Eq. (20), we need to calculate 


6 
Ep =a" “4 
V2 


1/2 (6.22) 
5 {[n(n —1)(n —2) 25.4 5 #32775 y.4 4 $3(0 +1975 
: | 


n',nt+l 


+ [(n 410 +2)(0 +3) Syeyas 


n'#n ha (n —n ’) 


The summation is not as cumbersome as may look, because at the curly brackets’ squaring, all mixed 
products are proportional to the products of different Kronecker deltas and hence vanish, so that we 
need to sum up only the squares of each term, finally getting 


2.6 
E® = Be aneZ). (6.23) 


This formula shows that all energy level corrections are negative, regardless of the sign of a.3 On the 
contrary, the 1“ order correction E,”, given by Eq. (16), does depend on the sign of £, so that the net 
correction, E,) + E,”, may be of any sign. 


3 Note that this is correct for the ground-state energy correction E,” of any system, because for this state, the 
denominators of all terms of the sum (20) are negative, while their numerators are always non-negative. 
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The results (18) and (20) are clearly inapplicable to the degenerate case where, in the absence of 
perturbation, several states correspond to the same energy level, because of the divergence of their 
denominators.* This divergence hints that in this case, the largest effect of the perturbation is the 
degeneracy lifting, e.g., some splitting of the initially degenerate energy level E (Fig. 2), and that for 
the analysis of this case we can, in the first approximation, ignore the effect of all other energy levels. 
(A careful analysis shows that this is indeed the case until the level splitting becomes comparable with 
the distance to other energy levels.) 


——e 
be) be) we) ee 
fh Eg E, 
ae Fig. 5.2. Lifting the energy 
Ey level degeneracy by a 
H=ah® Daf +H perturbation (schematically). 


Limiting the summation in Eq. (7) to the group of N degenerate states with equal E,° = E®, we 
reduce it to 


, WO De =a Oe HE, (6.24) 
2, 


n"=| 


where now n’ and n” number the N states of the degenerate group.> For n = n’, Eq. (24) may be 
rewritten as 


n'n" 


N 
> (4, -E%6 


n"=l 


nt In’) _ 0, where EY = E, = E™. (6.25) 


For each n’ = 1, 2, ...N, this is a system of N linear, homogenous equations (with N terms each) for NV 
unknown coefficients (n’|n’). In this problem, we may readily recognize the problem of 
diagonalization of the perturbation matrix H“) — cf. Sec. 4.4 and in particular Eq. (4.101). As in the 
general case, the condition of self-consistency of the system is: 


(6.26) 


where now the index 1 numbers the N roots of this equation, in an arbitrary order. According to the 
definition (25) of E,, the resulting N energy levels E,, may be found as E° + E,. If the perturbation 
matrix is diagonal in the chosen basis n™, the result is extremely simple, 

B.=£™ =E" =A (6.27) 


mn ? 


4 This is exactly the reason why such simple perturbation approach runs into serious problems for systems with a 
continuous spectrum, and other techniques (such as the WKB approximation) are often necessary. 

5 Note that here the choice of the basis is to some extent arbitrary, because due to the linearity of equations of 
quantum mechanics, any linear combination of the states n’ is also an eigenstate of the unperturbed 
Hamiltonian. However, for using Eq. (25), these combinations have to be orthonormal, as was supposed at the 
derivation of Eq. (7). 
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and formally coincides with Eq. (14) for the non-degenerate case, but now it may give a different result 
for each of N previously degenerate states n. 


Now let us see what this general theory gives for several important examples. First of all, let us 
consider a system with just two degenerate states with energy sufficiently far from all other levels. Then, 
in the basis of these two degenerate states, the most general perturbation matrix is 


A A 
H® =| 778 12 (6.28) 
A, Ay, 


This matrix coincides with the general matrix (5.2) of a two-level system. Hence, we come to the very 
important conclusion: for a weak perturbation, all properties of any double-degenerate system are 
identical to those of the genuine two-level systems, which were the subject of numerous discussions in 
Chapter 4 and again in Sec. 5.1. In particular, its eigenenergies are given by Eq. (5.6), and may be 
described by the level-anticrossing diagram shown in Fig. 5.1. 


6.2. The linear Stark effect 


As a more involved example of the level degeneracy lifting by a perturbation, let us discuss the 
Stark effect® — the atomic level splitting by an external electric field. Let us study this effect, in the linear 
approximation, for a hydrogen-like atom/ion. Taking the direction of the external electric field € (which 
is practically always uniform on the atomic scale) for the z-axis, the perturbation may be represented by 
the following Hamiltonian: 


H” =-F3 =-q& =-qércos@. (6.29) 


(In the last form, the operator sign is dropped, because we will work in the coordinate representation.) 


As you (should :-) remember, energy levels of a hydrogen-like atom/ion depend only on the 
principal quantum number n — see Eq. (3.201); hence all the states, besides the ground 1s state in which 
n= 1 and / = m = 0, have some orbital degeneracy, which grows rapidly with n. Let us consider the 
lowest degenerate level with n = 2. Since, according to Eq. (3.203), 0 </<n-1, at this level the orbital 
quantum number / may equal either 0 (one 2s state, with m = 0) or 1 (three 2p states, with m = 0, +1). 
Due to this 4-fold degeneracy, H"” is a 4x4 matrix with 16 elements: 


[=0 l=1 
m=0m=0m=+lm=-l 
A, Hy Hy; Hy \m=0, 1=0, 
H® = Ay, Hy H,, Hy, | m=0, (6.30) 
Ay, Hy Hs; Hy |m=+1, ¢l=1 
Ay Hy. He Ay Jme=l, 


4l 42 43 


6 This effect was discovered experimentally in 1913 by Johannes Stark and independently by Antonio Lo Surdo, 
so it is sometimes (and more fairly) called the “Stark — Lo Surdo effect”. Sometimes this name is used with the 
qualifier “dc” to distinguish it from the ac Stark effect — the energy level shift under the effect of an ac field — see 
Sec. 5 below. 
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However, there is no need to be scared. First, due to the Hermitian nature of the operator, only 
ten of these matrix elements (four diagonal and six off-diagonal ones) may be substantially different 
from each other. Moreover, due to a high symmetry of the problem, there are a lot of zeros even among 
these elements. Indeed, let us have a look at the angular components Y;" of the corresponding 
wavefunctions, with / = 0 and / = 1, described by Eqs. (3.174)-(3.175). For the states with m = +1, the 
azimuthal parts of wavefunctions are proportional to exp {tig}; hence the off-diagonal elements H34 and 
Ha; of the matrix (30), relating these functions, are proportional to 


2a ou 
fda Ye HOY? 0 [ao[e*'? | Gua =0. (6.31) 
0 


The azimuthal-angle symmetry also kills the off-diagonal elements H)3, H\4, H23, Hr4 (and hence their 
complex conjugates H31, H41, H32, and H42), because they relate states with m = 0 and m = +1, and hence 
are proportional to 


2a 
fdQy, HY" « [dpe*? =0. (6.32) 
0 


For the diagonal matrix elements H33 and Hy4, corresponding to / = 1 and m = +1, the azimuthal-angle 
integrals do not vanish, but since the corresponding spherical harmonics depend on the polar angle as 
sin 8, these elements are proportional to 


a +1 
fda Y2 H°Y# « [sin Od sin cos sin @ = [cosa(l —cos’ 0)d(cos 0), (6.33) 
0 -l 


and hence are equal to zero — as any limit-symmetric integral of an odd function. Finally, for the states 
2s and 2p with m = 0, the diagonal elements H\; and 2 are also killed by the polar-angle integration: 


4 1 
fda Yo HOY? « [sin 6d cos 0 = [oso d(cos@) =0, (6.34) 
0 -l 


a +1 
faa ¥) HY) « [sin 6d cos* 0 = [cos’ @ d(cos@) = 0. (6.35) 
0 -l 


Hence, the only non-zero elements of the matrix (30) are two off-diagonal elements Hz and A), 
which relate two states with the same m = 0, but different /= {0, 1}, because they are proportional to 


ot ; lie ra 1 
faa Y, cos@Y, = Z| de Jsinauo cos’ 0 =—~ #0. (6.36) 
a 0 0 


V3 


What remains is to use Eqs. (3.209) for the radial parts of these functions to complete the calculation of 
those two matrix elements: 


Ay, =Ay = ee (r)rk, (7). (6.37) 


96 
Bi 


Due to the additive structure of the function )0(7), the integral falls into a sum of two table integrals, 
both of the type MA Eq. (6.7d), finally giving 


Hy =A, = 3q6r); (6.38) 
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where /9 is the spatial scale (3.192); for the hydrogen atom, it is just the Bohr radius rg — see Eq. (1.10). 
Thus, the perturbation matrix (30) is reduced to 


0 3qér, 0 0 
3q6 0 O00 
Hq” ra ek (6.39) 
0 0 O 0 
so that the condition (26) of self-consistency of the system (25), 
—El 3qér, 0 0 
3qér, -ES) 0 0 
oy 2 0) = 0, (6.40) 
0 0 -E£; 0 
0 0 0 -E 
gives a very simple characteristic equation 
2 2 
(ce?) -Gaen,)|-0. (6.41) 
with four roots: 
(6.42) 


x 3qer 
EO SS ae +— m=i1 
*y 3qér, 
, +— m= Fig. 6.3. The linear Stark effect for the 
|-) = _ (2s) = \2)) level n = 2 of a hydrogen-like atom. 


Generally, in order to understand the nature of states corresponding to these levels, we should 
return to Eq. (25) with each calculated value of E,”, and find the corresponding expansion coefficients 
(n” In’) that describe the perturbed states. However, in our simple case, the outcome of this procedure 
is clear in advance. Indeed, since the states with {/= 1, m =+ 1} are not affected by the perturbation at 
all (in the linear approximation in the electric field), their degeneracy is not lifted, and energy is not 
affected — see the middle line in Fig. 3. On the other hand, the partial perturbation matrix connecting the 
states 2s and 2p, i.e. the top left 2x2 part of the full matrix (39), is proportional to the Pauli matrix o,, 
and we already know the result of its diagonalization — see Eqs. (4.113)-(4.114). This means that the 


7 The proportionality of this splitting to the small field is responsible for the qualifier “linear” in the name of this 
effect. If observable effects grow only as ¢ (see, e.g., Problem 9), the term quadratic Stark effect is used instead. 
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upper and lower split levels correspond to very simple linear combinations of the previously degenerate 
states with m = 0, 


I+) = ayl25)# |2p)). (6.43) 


Finally, let us estimate the magnitude of the linear Stark effect for a hydrogen atom. For a very 
high electric field of & = 3x10° V/m,8 |q| =e ~ 1.6x10°° C, and ro = rg = 0.5x10"'° m, we get a level 
splitting of 3q¢ér ~ 0.8x10~ J = 0.5 meV. This number is much lower than the unperturbed energy of 
the level, E. = —Ey/(2x2’) ~ -3.4 eV, so that the perturbative result is quite applicable. On the other 
hand, the calculated splitting is much larger than the resolution limit imposed by the line’s natural width 
(~10°7 Ey, see Chapter 9), so that the effect is quite observable even in substantially lower electric fields. 
Note, however, that our simple results are quantitatively correct only when the Stark splitting (42) is 
much larger that the fine-structure splitting of the same level in the absence of the field— see the next 
section. 


6.3. Fine structure of atomic levels 


Now let us use the same perturbation theory to analyze, also for the simplest case of a hydrogen- 
like atom/ion, the so-called fine structure of atomic levels — their degeneracy lifting even in the absence 
of external fields. Since the effective speed v of the electron motion in atoms is much smaller than the 
speed of light c, the fine structure may be analyzed as a sum of two independent relativistic effects. To 
analyze the first of them, let us expand the well-known classical relativistic expression? for the kinetic 
energy T= E— mc’ of a free particle with the rest mass »,!0 


9572 
mC 


2 1/2 
T =(m2c* + p2c?)” — me? = me? f+) =i) (6.44) 


into the Taylor series with respect to the small ratio (p/mc)* ~ (v/c)’: 


2 4 2 4 
Pie Pa) a eae ca we Pe (6.45) 
2\ me 8 \ mec 2m 80°C 


and drop all the terms besides the two spelled-out terms. Of them, the first term is non-relativistic, while 
the second one represents the first relativistic correction to 7. 


Following the correspondence principle, the quantum-mechanical problem in this approximation 
may be described by the perturbative Hamiltonian (1), whose unperturbed part (whose eigenstates and 
eigenenergies were discussed in Sec. 3.5) is 

a2 
FO) ta U(r), U(r) = Ze ; (6.46) 
2m r 


while the kinetic-relativistic perturbation 


8 This value approximately corresponds to the threshold of electric breakdown in air at ambient conditions, due to 
the impact ionization. As a result, experiments with higher dc fields are rather difficult. 

9 See, e.g., EM Eq. (9.78) — or any undergraduate text on special relativity. 

!0 This fancy font is used, as in Secs. 3.5-3.8, to distinguish the mass » from the magnetic quantum number m. 
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a, m4 1 ~2\? Kinetic- 
H™ = P = P . (6.47) relativistic 
8mnc? Qmce? \ 2m perturbation 
Using Eq. (46), we may rewrite the last formula as 
#% --_* (7 df, (6.48) 
2mc 
so that its matrix elements participating in the characteristic equation (25) for a given degenerate energy 
level (3.201), i.e. a given principal quantum number n, are 
(nlm ia |nl'm') =— a (nim (40 - itp) (a = G(r) nl'm'), (6.49) 
mc 
where the bra- and ket-vectors describe the unperturbed eigenstates, whose eigenfunctions (in the 
coordinate representation) are given by Eq. (3.200): Wrim= &n,Ar)Y/"(89). 
It is straightforward (and hence left for the reader :-) to prove that all off-diagonal elements of 
the set (49) are equal to 0. Thus we may use Eq. (27) for each set of the quantum numbers {n, /, m}: 
Fg ae = Ein -~E = (nlm | |nim) = -—((#° -Hy ) 
mi = 2mc n,l,m (6 50) 
1 ( 5 n a ) 1 ee Beil ay 1 
== B-98 0) AG = C +O) | 
2me?\ " ( iz ( ie 2mc*\4n* nn? \r de a di 
where the index m has been dropped, because the radial wavefunctions &,((r), which affect these 
expectation values, do not depend on that quantum number. Now using Eqs. (3.191), (3.201) and the 
first two of Eqs. (3.211), we finally get 
> 2 Kinetic- 
Eo _ mC [ n | a 2E- n *) (6 51) relativistic 
MO wentl4% 4) meen 4) pervect 


Let us discuss this result. First of all, its last form confirms that the correction (51) is indeed 
much smaller than the unperturbed energy E, (and hence the perturbation theory is valid) if the latter is 
much smaller than the relativistic rest energy mc” of the particle — as it is for the hydrogen atom. Next, 
since in the Bohr problem’s solution n > /+ 1, the first fraction in the parentheses of Eq. (51) is always 
larger than 1, and hence than 3/4, so that the kinetic relativistic correction to energy is negative for all n 
and /. (Actually, this fact could be predicted already from Eq. (47), which shows that the perturbation’s 
Hamiltonian is a negatively defined form.) Finally, for a fixed principal number n, the negative 
correction’s magnitude decreases with the growth of /. This fact may be interpreted using the second of 
Eqs. (3.211): the larger is / (at fixed n), the larger is the particle’s effective distance from the center, and 
hence the smaller is its effective velocity, i.e. the smaller is the magnitude of the quantum-mechanical 
average of the negative relativistic correction (47) to the kinetic energy. 


The result (51) is valid for the Coulomb interaction U(r) = —C/r of any physical nature. However, 
if we speak specifically about hydrogen-like atoms/ions, there is also another relativistic correction to 
energy, due to the so-called spin-orbit interaction (alternatively called the “spin-orbit coupling”). Its 
physics may be understood from the following semi-quantitative classical reasoning: from the “the point 
of view” of an electron rotating about the nucleus at distance 7 with velocity v, it is the nucleus, of the 
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electric charge Ze, that rotates about the electron with the velocity (-v) and hence the time period 7 = 
2 ar/v. From the point of view of magnetostatics, such circular motion of the electric charge O = Ze , is 
equivalent to a circular de electric current J = Q/7 = (Ze)(v/2ar), which creates, at the electron’s 
location, i.e. in the center of the current loop, the magnetic field with the following magnitude:!! 


eg Se (6.52) 
2r-2r 2ar Aa’ 


The field’s direction n is perpendicular to the apparent plane of the nucleus’ rotation (i.e. that of the real 
rotation of the electron), and hence its vector may be readily expressed via the similarly directed vector 
L = m,vrn of the electron’s angular (orbital) momentum: 


Z Z Zz 
Sp PN et I ip pe a, (6.53) 


Anr°m Anr°m Anéyr?m,c’ 


where the last step used the basic relation between the SI-unit constants: pw = 1/c’é. 


A more careful (but still classical) analysis of the problem!2 brings both good and bad news. The 
bad news is that the result (53) is wrong by the so-called Thomas factor of two even for the circular 
motion, because the electron moves with acceleration, and the reference frame bound to it cannot be 
inertial (as was implied in the above reasoning), so that the effective magnetic field felt by the electron 
is actually 

R= Ze 


~ 3 2 
87E yr M,C 


L. (6.54) 


The good news is that this result is valid not only for circular but an arbitrary orbital motion in 
the Coulomb field U(r). Hence from the discussion in Sec. 4.1 and Sec. 4.4 we may expect that the 
quantum-mechanical description of the interaction between this effective magnetic field and the 
electron’s spin moment (4.115) is given by the following perturbation Hamiltonian!3 


S-L, (6.55) 


, a oh ; - 1 Ze’ 1 
H® =-m-B=-y8- — ei : _ 
87E,r M,C 
where at spelling out the electron’s gyromagnetic ratio % = —g,-e/2me, the small correction to the value ge 
= 2 of the electron’s g-factor (see Sec. 4.4) is ignored, because Eq. (55) is already a small correction. 
This expectation is confirmed by the fully-relativistic Dirac theory, to be discussed in Sec. 9.7 below: it 

yields, for an arbitrary central potential U(r), the following spin-orbit coupling Hamiltonian: 
1 ldU(r)a > 
5 S-L. (6.56) 


7 = 
2m?c? r dr 


e 


For the Coulomb potential U(r) = —Ze?/4zer, this formula is reduced to Eq. (55). 


'l See, e.g., EM Sec. 5.1, in particular, Eq. (5.24). Note that such effective magnetic field is induced by any 
motion of electrons, in particular that in solids, leading to a variety of spin-orbit effects there — see, e.g., a concise 
review by R. Winkler ef a/., in B. Kramer (ed.), Advances in Solid State Physics 41, 211 (2001). 

!2 Tt was carried out first by Llewellyn Thomas in 1926; for a simple review see, e.g., R. Harr and L. Curtis, Am. 

J. Phys. 55, 1044 (1987). 

!3 In the Gaussian units, Eq. (55) is valid without the factor 4é in the denominator; while Eq. (56), “as is”. 


Chapter 6 Page 11 of 36 


Essential Graduate Physics QM: Quantum Mechanics 


As we already know from the discussion in Sec. 5.7, the angular factor of this Hamiltonian 
commutes with all the operators of the coupled-representation group (inside the blue line in Fig. 5.12): 


fig ; s ; Nic , and J ,, and hence is diagonal in the coupled-representation basis with definite quantum 
numbers /, j, and m; (and of course s = 2). Hence, using Eq. (5.181) to rewrite Eq. (56) as 


1 Ze 11 


H® = 
2m2c? Ane, r° 2 


rs"), (6.57) 


we may again use Eq. (27) for each set {s, /, 7, mj}, with common n: 


2 
p= (5) (7? P $) (6.58) 


sjol 222: 
""— Imic? 46, \r° 


where the indices irrelevant for each particular factor have been dropped. Now using the last of Eqs. 
(3.211), and similar expressions (5.169), (5.175), and (5.177) for eigenvalues of the involved operators, 
we get an explicit expression for the spin-orbit corrections!4 


Spin- 
orbit 
energy 
correction 
with / andj related by Eq. (5.189): 7 =/+ 4. 
The last form of its result shows clearly that this correction has the same scale as the kinetic 
correction (51).!5 In the 1 order of the perturbation theory, they may be just added (with » = me), 
giving a surprisingly simple formula for the net fine structure of the n™ energy level: 
Fine 
(6.60) Sratomic 


levels 


This simplicity, as well as the independence of the result of the orbital quantum number /, will become 
less surprising when (in Sec. 9.7) we see that this formula follows in one shot from the Dirac theory, in 
which the Bohr atom’s energy spectrum in numbered only with n andj, but not /. Let us recall that for 
an electron (s = '4), according to Eq. (5.189) with 0 < / <n — 1, the quantum number j may take n 
positive half-integer values, from % to n — %. Hence, Eq. (60) shows that the fine structure of the n"™ 
Bohr’s energy level has n sub-levels — see Fig. 4. 


Please note that according to Eq. (5.175), each of these sub-levels is still (27 + 1)-times 
degenerate in the quantum number m;. This degeneracy is very natural, because in the absence of an 
external field the system is still isotropic. Moreover, on each fine-structure level (besides the highest one 
with j = n — '4), each of the m)-states 1s doubly-degenerate in the orbital quantum number / = / + /2 — see 
the labels of / in Fig. 4. (According to Eq. (5.190), each of these states, with fixed 7 and m;, may be 


'4 The factor / in the denominator does not give a divergence at / = 0, because in this case j = s = 4, so that j(j + 
1) = %, and the numerator turns into 0 as well. A careful analysis of this case (which may be found, e.g., in G. 
Woolgate, Elementary Atomic Structure, 2™ ed., Oxford, 1983), as well as the exact analysis of the hydrogen 
atom using the Dirac theory (see Sec. 9.7), show that Eq. (60), which does not include /, is valid even in this case. 

!5 This is natural, because the magnetic interaction of charged particles is essentially a relativistic effect, of the 
same order (~v7/c’) as the kinetic correction (47) —see, e.g., EM Sec. 5.1, in particular Eq. (5.3). 
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represented as a linear combination of two states with adjacent values of /, and hence different electron 
spin orientations, m; = t’2, weighed with the Clebsch-Gordan coefficients.) 


E, a Aaa oe ae j=n-th 
Ny one =n- 
ae 
ee j=12 JH=3/2 
is Fig. 6.4. The fine structure of a 
. =I pHs hydrogen-like atom’s level. 


These details aside, one may crudely say that the relativistic corrections combined make the total 
eigenenergy grow with /, contributing to the effect already mentioned at our analysis of the periodic 
table of elements in Sec. 3.7. The relative scale of this increase may be scaled by the largest deviation 
from the unperturbed energy E,, reached for s-states (with / = 0, 7 = 4): 


Eo E 2 \2 
| sl [2m ac Ze (4 2, )=2°a°(2- 2}. (6.61) 
E, m,C 2 Ame hic) \n An n 4n 
where ais the fine-structure (“Sommerfeld’s”) constant, 
2, 
jee aps 2, (6.62) 


Ame hic 137 


(already mentioned in Sec. 4.4), which characterizes the relative strength (or rather weakness :-) of the 
electromagnetic effects in quantum mechanics — which in particular makes the perturbative quantum 
electrodynamics possible.!° These expressions show that the fine-structure splitting is a very small effect 
(~a? ~ 10°) for the hydrogen atom, but it rapidly grows (as Z’) with the nuclear charge (i.e. the atomic 
number) Z, and becomes rather substantial for the heaviest stable atoms with Z ~ 107. 


6.4. The Zeeman effect 


Now, we are ready to review the Zeeman effect — the atomic level splitting by an external 
magnetic field.!7 Using Eq. (3.26), with g = —e, for the description of the electron’s orbital motion in the 
field, and the Pauli Hamiltonian (4.163), with y= —e/me, for the electron spin’s interaction with the field, 
we see that even for a hydrogen-like (i.e. single-electron) atom/ion, neglecting the relativistic effects, 
the full Hamiltonian is rather involved: 


2 
Pa (geek) =e os (6.63) 
2m Aner om 


e 


16 The expression @& = Ey/m.c’, where Ey is the Hartree energy (1.13), i.e. the scale of energies E,, is also very 
revealing. 

'7 It was discovered experimentally in 1896 by Pieter Zeeman who, amazingly, was fired from the University of 
Leiden for unauthorized use of lab equipment for this work — just to receive a Nobel Prize for it in a few years! 


Chapter 6 Page 13 of 36 


Essential Graduate Physics QM: Quantum Mechanics 


There are several simplifications we may make. First, let us assume that the external field is 
spatial-uniform on the atomic scale (which is a very good approximation for most cases), so that we can 
take its vector potential in an axially-symmetric gauge — cf. Eq. (3.132): 


A=sBxr, (6.64) 


Second, let us neglect the terms proportional to #, which are small in practical magnetic fields of the 
order of a few teslas.!8 The remaining term in the effective kinetic energy, describing the interaction 
with the magnetic field, is linear in the momentum operator, so that we may repeat the standard classical 
calculation!® to reduce it to the product of 4% by the orbital magnetic moment’s component m, = — 


eL,/2m- — besides that both m, and L, should be understood as operators now. As a result, the 
Hamiltonian (63) reduces to Eq. (1), HOA where H is that of the atom at Z = 0, and 


(6.65) 


This expression immediately reveals the major complication with the Zeeman effect’s analysis. 
Namely, in comparison with the equal orbital and spin contributions to the total angular momentum 
(5.170) of the electron, its spin produces a twice larger contribution to the magnetic moment, so that the 
right-hand side of Eq. (65) is not proportional to the total angular moment J. As a result, the effect’s 
description is simple only in two limits. 


If the magnetic field is so high that its effects are much stronger than the relativistic (fine- 
structure) effects discussed in the previous section, we may treat the two terms in Eq. (65) as 
independent perturbations of different (orbital and spin) degrees of freedom. Since each of the 
perturbation matrices is diagonal in its own z-basis, we can again use Eq. (27) to write 


Zz 


EB =< (n,t,m,|6,|nt,m)+2(m,|8.|m,))=<™ (hm, + 2h.) = toy Blm £1. | (6.66) 
m m 


e e 


This result describes splitting of each 2x(2/ + 1)-degenerate energy level, with certain n and /, into (2/ 
+3) levels (Fig. 5), with the adjacent level distance of up , of the order of 107 eV per tesla. 


7; = m, =+2,m, =—-1/2 
! <4—_ 

iy F ZB m, = 9, m, =+1/2 

e ij mae) 2 

EO = + ‘ 
nl % |e 2 m, =-1,m, =+1/2 

we B‘ 

\ ‘ «—— J, =9,m, =—-1/2 Fig. 6.5. The Paschen-Back effect. 
m,=—-2,m, =+1/2 


18 Despite its smallness, the quadratic term is necessary for a description of the negative contribution of the orbital 
motion to the magnetic susceptibility y,, (the so-called orbital diamagnetism, see EM Sec. 5.5), whose analysis, 
using Eq. (63), is left for the reader’s exercise. 

19 See, e.g., EM Sec. 5.4, in particular Eqs. (5.95) and (5.100). 
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Note that all the levels, these besides the top and bottom ones, remain doubly degenerate. This 
limit of the Zeeman effect is sometimes called the Paschen-Back effect — whose simplicity was 
recognized only in the 1920s, due to the need in very high magnetic fields for its observation. 


In the opposite limit of relatively /ow magnetic fields, the Zeeman effect takes place on the 
background of the much larger fine-structure splitting. As was discussed in Sec. 3, at 4 = 0 each split 
sub-level has a 2x(27 + 1)-fold degeneracy corresponding to (27 + 1) different values of the half-integer 
quantum number m,, ranging from ~ to +, and two values of the integer / = 7 + ’2 — see Fig. 4.2° The 
magnetic field lifts this degeneracy. Indeed, in the coupled representation discussed in Sec. 5.7, the 
perturbation (65) is described by the matrix with elements 


Y, 


H® = (jam, L, +28, i'm, )= _ (j.m, lV. +8, j',m;) 
. e (6.67) 
=F lim 5mm, +(j.m, S, im))) 


To spell out the second term, let us use the general expansion (5.183) for the particular case s = 
Y2, When (as was discussed in the end of Sec. 5.7) it has at most two non-vanishing terms, with the 
Clebsh-Gordan coefficients (5.190): 

| jJ=lth,m ) 
1=m,+% (6.68) 


21+1 


l+m,+”% 
= +] —_1_ 
21+1 


1/2 1/2 
m=m,—%,m,=+'4)+ m,=m,+%,m,=—). 
1 J S 1 J Ss 


Taking into account that the operator iy gives non-zero brackets only for m; = ms’, the 2x2 matrix of 


elements (m, =m, +'A,m, = FY2|S.|m, =m, +t'a,m, = FY) is diagonal, so we may use Eq. (27) to get 


atm, +%) wsaiaae| 


214+1 2 214+1 
(6.69) 


I =U,4Bm,| 1+ ! for—j<m,<+/] 
i ea a a os 


where the two signs correspond to the two possible values of / =7 ¥ '2 — see Fig. 6. 


‘ —__ ¢—m, = 43/2 48 
fe /— +m, = 43/2 
‘7 if 

6 Y —— «Mm , = +1/2 ‘ ae q—m, =+1/2 
47 we 

E,.; ——\—--+ <<< 


-+ 

; . mJ . ws, eo 
=j-’ wos, m,=-1/2 l=jt %, ——— +1, = 1/2 
w\ —— <M, = —3/2 


— «—n,, = -3/2 ‘ 


Fig. 6.6. The anomalous Zeeman effect in a hydrogen-like atom/ion. 


20 Tn the almost-hydrogen-like, but more complex atoms (such as those of alkali metals), the degeneracy in / may 
be lifted by electron-electron Coulomb interaction even in the absence of external magnetic field. 
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We see that the magnetic field splits each sub-level of the fine structure, with a given /, into 27 + 
1 equidistant levels, with the distance between the levels depending on /. In the late 1890s, when the 
Zeeman effect was first observed, there was no notion of spin at all, so that this puzzling result was 
called the anomalous Zeeman effect. (In this terminology, the normal Zeeman effect is the one with no 
spin splitting, i.e. without the second terms in the parentheses of Eqs. (66), (67), and (69); it was first 
observed in 1898 by Preston Thomas in atoms with zero net spin.) 

The strict quantum-mechanical analysis of the anomalous Zeeman effect for arbitrary s (which is 
important for applications to multi-electron atoms) is conceptually not complex, but requires explicit 
expressions for the corresponding Clebsch-Gordan coefficients, which are rather bulky. Let me just cite 
the unexpectedly simple result of this analysis: 


AE = Lt, Bm ,g, (6.70a) 


where g is the so-called Lande factor:?! 


prem re Le ce aa ca (6.70b) 
21+) | . 


For s = % (and hence j =/ + 4), this factor is reduced to the parentheses in the last forms of Eq. (69). 


It is remarkable that Eqs. (70) may be readily derived using very plausible classical arguments, 
similar to those used in Sec. 5.7 — see Fig. 5.13 and its discussion. As was discussed in Sec. 5.6, in the 
absence of spin, the quantization of the observable L, is an extension of the classical picture of the 
torque-induced precession of the vector L about the magnetic field’s direction, so that the interaction 
energy, proportional to ZL, = B -L, remains constant — see Fig. 7a. On the other hand, at the spin-orbit 
interaction without an external magnetic field, the Hamiltonian function of the system includes the 
product S-L, so that in the stationary state it has to be constant, together with Je L’, and S*. Hence, this 
system’s classical image is a joint precession of the vectors S and L about the direction of the vector J = 
L +S, in such a manner that the spin-orbit interaction energy, proportional to the product L-S, remains 
constant (Fig. 7b). On this backdrop, the anomalous Zeeman effect in a relatively weak magnetic field B 
= Hn, corresponds to a much slower precession of the vector J about the z-axis, “dragging” with it the 
vectors L and §S, rapidly rotating around it. 


Fig. 6.7. Classical images of (a) 
the orbital angular momentum’s 
quantization in a magnetic field, 
and (b) the fine-structure level 
L-S =const splitting. 


21 This formula is frequently used with capital letters J, S, and L, which denote the quantum numbers of the atom 
as a whole. 
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This physical picture allows us to conjecture that what is important for the slow precession rate 
are only the vectors L and S averaged over the period of their much faster precession around vector J — 
in other words, only their components L, and S,; along the vector J. Classically, these components may 
be calculated as 


Ted S.J 


L, ip J, and S, ae is (6.71) 


The scalar products participating in these expressions may be readily expressed via the squared lengths 
of the vectors, using the following geometric formulas: 


S’=(S-LY =/’?+LV-2L-J, LP =(J-S) sJ°+S’-23;S. (6.72) 


As aresult, we get the following time average: 


L, +28, =(L,+28,), -(Sa +2525) = : (L-J+2S-J) 


J? J? 2 
(6.73) 
(etd SS VR? BS HeZ-) Jee SS 
SS ee Lp . 
2J 2S 


The last move is to smuggle in some quantum mechanics by using, instead of the vector lengths 
squared and the z-component of J,, their eigenvalues given by Eqs. (5.169), (5.175), and (5.177). As a 
result, we immediately arrive at the exact Eqs. (70). This coincidence encourages thinking about 
quantum mechanics of angular momenta in the classical terms of torque-induced precession, which turns 
out to be very fruitful in some more complex problems of atomic and molecular physics. 


The high-field limit and low-field limits of the Zeeman effect, described respectively by Eqs. 
(66) and (69), are separated by a medium field range, in which the Zeeman splitting is of the order of the 
fine-structure splitting analyzed in Sec. 3. There is no time in this course for a quantitative analysis of 
this crossover.” 


6.5. Time-dependent perturbations 


Now let us proceed to the case when the perturbation H in Eq. (1) is a function of time, while 


H is time-independent. The adequate perturbative approach to this problem, and its results, depend 
critically on the relation between the characteristic frequency @ of the perturbation and the distance 
between the initial system’s energy levels: 


hao 


(yas ee 


; (6.74) 


In the case when all essential frequencies of a perturbation are very small in the sense of Eq. 
(74), we are dealing with the so-called adiabatic change of parameters, that may be treated essentially as 
a time-independent perturbation — see the previous sections of this chapter). The most interesting 
observation here is that the adiabatic perturbation does not allow any significant transfer of system’s 


22 For a more complete discussion of the Stark, Zeeman, and fine-structure effects in atoms, I can recommend, for 
example, either the monograph by G. Woolgate cited above, or the one by I. Sobelman, Theory of Atomic Spectra, 
Alpha Science, 2006. 
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probability from one eigenstate to another. For example, in the WKB limit of the orbital motion, the 
Bohr quantization rule and its Wilson-Sommerfeld modification (2.110) guarantee that the integral 


fp -dr , (6.75) 


taken along the particle’s classical trajectory, is an adiabatic invariant, i.e. does not change at a slow 
change of system’s parameters. (It is curious that classical mechanics also guarantees the invariance of 
the integral (75), but its proof there?3 is much harder than the quantum-mechanical derivation of this 
fact, carried out in Sec. 2.4.) This is why even if the perturbation becomes large with time (while 
changing sufficiently slowly), we can expect the classification of eigenstates and eigenvalues to persist. 


Let us proceed to the harder case when both sides of Eq. (74) are comparable, using for this 
discussion the Schrédinger picture of quantum dynamics, given by Eq. (4.158). Combining it with Eq. 
(1), we get the Schrédinger equation in the form 


in<|at)) = (4 +H W)a(). (6.76) 


Very much in the spirit of our treatment of the time-independent case in Sec. 1, let us represent the time- 
dependent ket-vector of the system with its expansion, 


Jat) = do |n) (nla), (6.77) 
over the full and orthonormal set of the unperturbed, stationary ket-vectors defined by equation 
H®|n\=E,|n). (6.78) 


(Note that these kets |n) are exactly what was called |n» in Sec. 1; we may afford a less bulky notation 
in this section, because only the lowest orders of the perturbation theory will be discussed.) Plugging the 
expansion (77), with n replaced with n’, into both sides of Eq. (76), and then inner-multiplying both its 
sides by the bra-vector (n| of another unperturbed (and hence time-independent) state of the system, we 
get the following set of linear, ordinary differential equations for the expansion coefficients: 


in (nla) = E,,(n|a(t))+ EE (t)(n'|a(t)), (6.79) 


where the matrix elements of the perturbation, in the unperturbed state basis, defined similarly to Eq. 
(8), are now functions of time: 


H(t) = (nH (O|n'). (6.80) 


The set of differential equations (79), which are still exact, may be useful for numerical 
calculations.24 However, it has a certain technical inconvenience, which becomes clear if we consider its 
(evident) solution in the absence of perturbation:*> 


23 See, e.g., CM Sec. 10.2. 

24 Even if the problem under analysis may be described by the wave-mechanics Schrédinger equation (1.25), a 
direct numerical integration of that partial differential equation is typically less convenient than that of the 
ordinary differential equations (79). 

25 This is of course just a more general form of Eq. (1.62) of the wave mechanics of time-independent systems. 
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(n|a(t)) = (nla(o)exoy- i - z . (6.81) 


We see that these solutions oscillate very fast, and their numerical modeling may represent a challenge 
for even the fastest computers. These spurious oscillations (whose frequency, in particular, depends on 
the energy reference level) may be partly tamed by looking for the general solution of Eqs. (79) in a 
form inspired by Eq. (81): 


(n|a(t)) =a,(t) exp Far : (6.82) 


Here a,(f) are new functions of time (essentially, the stationary states’ probability amplitudes), 
which may be used, in particular, to calculate the time-dependent level occupancies, i.e. the probabilities 
W,, to find the perturbed system on the corresponding energy levels of the unperturbed system: 


W(t) =|(nla@)} =a, (0). (6.83) 


Plugging Eq. (82) into Eq. (79), for these functions we readily get a slightly modified system of 
equations: 


nn 


ind, = Y.a,H® (he @m", (6.84) 


where the factors @,», defined by the relation 


have the physical sense of frequencies of potential quantum transitions between the n" and n’" energy 
levels of the unperturbed system. (The conditions when such transitions indeed take place will be clear 
soon.) The advantages of Eq. (84) over Eq. (79), for both analytical and numerical calculations, is their 
independence of the energy reference, and lower frequencies of oscillations of the right-hand side terms, 
especially when the energy levels of interest are close to each other.?° 


In order to continue our analytical treatment, let us focus on a particular but very important 
problem of a sinusoidal perturbation turned on at some moment — which may be taken for ¢ = 0: 


for t <0, 
(6.86) 


H(t)= ‘ ; : ; 
Ae (x Age for t=0, 


where the perturbation amplitude operators A and At ,27 and hence their matrix elements, 


26 Note that the relation of Eq. (84) to the initial Eq. (79) is very close to the relation of the interaction picture of 
quantum dynamics, discussed at the end of Sec. 4.6, to its Schrédinger picture, with the perturbation Hamiltonian 
playing the role of the interaction one — compare Eqs. (1) and Eq. (4.206). Indeed, Eq. (84) could be readily 
obtained from the interaction picture, and I did not do this just to avoid using this heavy bra-ket artillery for our 
current (relatively) simple problem, and hence to keep its physics more transparent. 

27 The notation of the amplitude operators in Eq. (86) is justified by the fact that the perturbation Hamiltonian has 
to be self-adjoint (Hermitian), and hence each term on the right-hand side of that relation has to be a Hermitian 
conjugate of its counterpart, which is evidently true only if the amplitude operators are also the Hermitian 
conjugates of each other. Note, however, that each of these amplitude operators is generally not Hermitian. 
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(n|Aln'\=4,,, (nat |n"\= A), (6.87) 
are time-independent after the turn-on moment. In this case, Eq. (84) yields 
iha, = Ya Aer + ie al for t>0. (6.88) 


This is, generally, still a nontrivial system of coupled differential equations; however, it allows 
simple and explicit solutions in two very important limits. First, let us assume that our system initially 
was definitely in one eigenstate n’ (usually, though not necessarily, in the ground state), and that the 
occupancies W,, of all other levels stay very low all the time. (We will find the condition when the 
second assumption is valid a posteriori — from the solution.) With these assumptions, 


a,=1; |<,,| <<l, forn#n', (6.89) 


Eq. (88) may be readily integrated, giving 


A. Oe A, 
ne =| oO oy i nn ee 1} forn#n'. (6.90) 
ilo @) no, + o) 


nn' 


This expression describes what is colloquially called the ac excitation of (other) energy levels. 
Qualitatively, it shows that the probability W,, (83) of finding the system in each state (“on each energy 
level’) of the system does not tend to any constant value but rather oscillates in time. It also shows that 
that the ac-field-induced transfer of the system from one state to the other one has a clearly resonant 
character: the maximum occupancy W, of a level number 1 # n’ grows infinitely when the 
corresponding detuning?® 

A, =Q-@,,, (6.91) 


tends to zero. This conclusion is clearly unrealistic, and is an artifact of our initial assumption (89); 
according to Eq. (90), it is satisfied only if29 


A. (6.92) 


<<h|@+o,, 


and hence which does not allow a more deep analysis of the resonant excitation. 


In order to overcome this limitation, we may perform the following trick — very similar to the 
one we used for the transfer to the degenerate case in Sec. 1. Let us assume that for a certain level n, 


OtLO.,||\OL@ 


nn 


[A 


<< @, . for alln"” # n,n’ (6.93) 


5 n"n' 


— the condition illustrated in Fig. 8. Then, according to Eq. (90), we may ignore the occupancy of all but 
two levels, n and n’, and also the second, non-resonant term with frequency @n+ @ ~2@ >> |Ann| in 
Eqs. (88),>° now written for two probability amplitudes, a, and a,.. 


28 The notion of detuning is also very useful in the classical theory of oscillations (see, e.g., CM Chapter 5), where 
the role of @,,: is played by the own frequency @ of the oscillator. 

29 Strictly speaking, one more condition is that the number of “resonance” levels is also not too high — see Sec. 6. 
30 The second assumption, i.e. the omission of non-resonant terms in the equations for amplitudes is called the 
Rotating Wave Approximation (RWA); the same idea in the classical theory of oscillations is the basis of what is 
usually called the van der Pol method, and its result, the reduced equations — see, e.g., CM Secs. 5.3-5.5. 
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hA,, > 9 


Fig. 6.8. The resonant excitation of 
an energy level. 


The result is the following system of two linear equations: 
ina, =a,Ae™, ita, =a, el, (6.94) 


which uses the shorthand notation A = A, and A = A,,,’. (I will use this notation for a while — until other 
energy levels become involved, at the beginning of the next section). This system may be readily 
reduced to a form without explicit time dependence of the right-hand parts — for example, by introducing 
the following new probability amplitudes, with the same moduli: 


TALS 2 ha ee (6.95) 


n n 


b, =a,e 
so that 
eebhe ?. ashe. (6.96) 


n n n n 


Plugging these relations into Eq. (94), we get two usual linear first-order differential equations: 
. hA . hA 
in, = aa - Ab,,, inb,, = Ab, - rae : (6.97) 


As the reader knows very well by now, the general solution of such a system is a linear combination of 
two exponential functions, exp {+t}, with the exponents J: that may be found by plugging any of these 
functions into Eq. (97), and requiring the consistency of the two resulting linear algebraic equations. In 
our case, the consistency condition (i.e. the characteristic equation of the system) is 


—hA/2—-iha A 
x =Q, (6.98) 
A hA/2—-ihA 
and has two solutions 2. = +iQ, where 
1/2 1/2 
Rabi ; ae i 
oscillations: =| —+— 5 Le. 20, =} AX + a : (6.99) 
frequency h 


The coefficients at the exponents are determined by initial conditions. If, as was assumed before, 
the system was completely on the level n’ initially (at t = 0), i.e. if a, (0) = 1, a,(0) = 0, so that b, (0) = 
1, 5,(0) = 0 as well, then Eqs. (97) yield, in particular: 


A 
b (t) = -im—sin 1, 6.100 
,(t) 0 ( ) 


so that the n'" level occupancy is 
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2 
AN Qt. (6.101) 
|A|? +(nA/ 27 


This is the famous Rabi oscillation formula.*' If the detuning 1s large in comparison with | A |/A, 
though still small in the sense of Eq. (93), the frequency 2Q of the Rabi oscillations is completely 
determined by the detuning, and their amplitude is small: 


2 
W (t)=4 4 >sin’ ee 1, for |A]’ << (hA)’, (6.102) 
hA 2 

— the result which could be obtained directly from Eq. (90), just neglecting the second term on its right- 
hand side. However, now we may also analyze the results of an increase of the perturbation amplitude: it 
leads not only to an increase of the amplitude of the probability oscillations, but also of their frequency 
— see Fig. 9. Ultimately, at | A | >> A|A| (for example, at the exact resonance, A = 0., i.e. @in’ = @, so that 
E, = E, + ho), Eqs. (101)-(102) give Q = | A \/h and (Wi)max = 1, 1.e. describe a periodic, full 
“repumping” of the system from one level to another and back, with a frequency proportional to the 
perturbation amplitude.*? 


Fig. 6.9. The Rabi oscillations 
for several values of the 
normalized amplitude of ac 
perturbation. 


t (27 /|A)) 


This effect is a close analog of the quantum oscillations in two-level systems with time- 
independent Hamiltonians, which were discussed in Secs. 2.6 and 5.1. Indeed, let us revisit, for a 
moment, their discussion started at the end of Sec.1 of this chapter, now paying more attention to the 
time evolution of the system under the perturbation. As was argued in that section, the most general 
perturbation Hamiltonian lifting the two-fold degeneracy of an energy level, in an arbitrary basis, has 
the matrix (28). Let us describe the system’s dynamics using, again, the Schrédinger picture, 
representing the ket-vector of an arbitrary state of the system in the form (5.1), where 7 and are the 


3! Tt was derived in 1952 by Isaac Rabi, in the context of his group’s pioneering experiments with the ac 
(practically, microwave) excitation of quantum states, using molecular beams in vacuum. 

32 As Eqs. (82), (96), and (99) show, the lowest frequency in the system is @ = @,'— A/2 + Q, so that at A > 0, 
ha ~ ha, + 2\|A\’/nA. This effective shift of the lowest energy level (which may be measured by another “probe” 
field of a different frequency) is a particular case of the ac Stark effect, which was already mentioned in Sec. 2. 
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time-independent states of the basis in that Eq. (28) is written (now without any obligation to associate 
these states with the z-basis of any spin-’2.) Then, the Schrédinger equation (4.158) yields 


af | = sed = 2 Hin a = ce + | , (6.103) 
a, ay Hy, Hy \a&y H,,a,+H,ay, 

As we know (for example, from the discussion in Sec. 5.1), the average of the diagonal elements 
of the matrix gives just a common shift of the system’s energy; for the purpose of the dynamics analysis, 
it may be absorbed into the energy reference level. Also, the Hamiltonian operator has to be Hermitian, 


so that the off-diagonal elements of its matrix have to be complex-conjugate. With this, Eqs. (103) are 
reduced to the form, 


ina, = 2a, + Hy, ina, = Ha, +£ay, with 7¢é =H,, —H,,, (6.104) 


which is absolutely similar to Eqs. (97). In particular, these equations describe the quantum oscillations 


of the probabilities Wt = |ay|’ and W. = |ay|" with the frequency 
2 1/2 
Aa| 


20 =| €? +4 . (6.105) 


The similarity of Eqs. (97) and (104), and hence of Eqs. (99) and (105), shows that the “usual” 
quantum oscillations and the Rabi oscillations have essentially the same physical nature, besides that in 
the latter case the external ac signal quantum fiw bridges the separated energy levels, effectively 
reducing their difference (E,, — E,,’) to a much smaller difference —A = (E, — E,’) — ha. Also, since the 
Hamiltonian (28) is similar to that given by Eq. (5.2), the dynamics of such a system with two ac- 
coupled energy levels, within the limits (93) of the perturbation theory, is completely similar to that of a 
time-independent two-level system. In particular, its state may be similarly represented by a point on the 
Bloch sphere shown in Fig. 5.3, with its dynamics described, in the Heisenberg picture, by Eq. (5.19). 
This fact is very convenient for the experimental implementation of quantum information systems (to be 
discussed in more detail in Sec. 8.5), because it enables qubit manipulations in a broad variety of 
physical systems with well-separated energy levels, using external ac (usually either microwave or 
optical) sources. 


Note, however, that according to Eq. (90), if the system has energy levels other than 1 and n’, 
they also become occupied to some extent. Since the sum of all occupancies equals 1, this means that 
(W,,)max May approach | only if the other excitation amplitude is very small, and hence the state 
manipulation time scale 7 = 27/Q = 27h/| A | is very long. The ultimate limit in this sense is provided by 
the harmonic oscillator where all energy levels are equidistant, and the probability repumping between 
all of them occurs at comparable rates. In particular, in this system the implementation of the full Rabi 
oscillations is impossible even at the exact resonance.*4 


33 By the way, Eq. (105) gives a natural generalization of the relations obtained for the frequency of such 
oscillations in Sec. 2.6, where the coupled potential wells were assumed to be exactly similar, so that € = 0. 
Moreover, Eqs. (104) gives a long-promised proof of Eqs. (2.201), and hence a better justification of Eqs. (2.203). 
34 From Sec. 5.5, we already know what happens to the ground state of an oscillator at its external sinusoidal (or 
any other) excitation: it turns into a Glauber state, i.e. a superposition of all Fock states — see Eq. (5.134). 
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However, I would not like these the quantitative details to obscure from the reader the most 
important qualitative (OK, maybe semi-quantitative :-) conclusion of this section’s analysis: a resonant 
increase of the interlevel transition intensity at @ + @,,. As will be shown later in the course, in a 
quantum system coupled to its environment at least slightly (hence in reality, in any quantum system), 
such increase is accompanied by a sharp increase of the external field’s absorption, which may be 
measured. This effect has numerous practical applications including spectroscopies based on the 
electron paramagnetic resonance (EPR) and nuclear magnetic resonance (NMR), which are broadly used 
in material science, chemistry, and medicine. Unfortunately, I will not have time to discuss the related 
technical issues and methods (in particular, interesting ac pulsing techniques, including the so-called 
Ramsey interferometry) in detail, and have to refer the reader to special literature.?5 


6.6. Quantum-mechanical Golden Rule 


One of the results of the past section, Eq. (102), may be used to derive one of the most important 
and nontrivial results of quantum mechanics. For that, let us consider the case when the perturbation 
causes quantum transitions from a discrete energy level E,, into a group of eigenstates with a very dense 
(essentially continuous) spectrum E,, — see Fig. 10a. 


(a) (b) 
oe 
0.2 
ho Fig. 6.10. Deriving the Golden 
0.1 Rule: (a) the energy level 
E., scheme, and (b) the function 
15 0 15 under the integral (108). 


t 


nn' 


If, for all states n of the group, the following conditions are satisfied 


|Ane| << Ayn)? << (ho,,) , (6.106) 


nn' 


then Eq. (102) coincides with the result that would follow from Eq. (90). This means that we may apply 
Eq. (102), with the indices n and n’ duly restored, to any level n of our tight group. As a result, the total 
probability of having our system transferred from the initial level n’ to that group is 

aD Amt 


4 Al 
W.)=LW,o= +> sin 5 


(6.107) 


Now comes the main, absolutely beautiful trick: let us assume that the summation over 7 is 
limited to a tight group of very similar states whose matrix elements A,,,, are virtually similar (we will 
check the validity of this assumption later on), so that we can take |4,,, |” out of the sum in Eq. (107) and 
then replace the sum with the corresponding integral: 


35 For introductions see, e.g., J. Wertz and J. Bolton, Electron Spin Resonance, 2" ed., Wiley, 2007; J. Keeler, 
Understanding NMR Spectroscopy, 2" ed., Wiley, 2010. 
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2 


2 
4A. 1 A ot AA | Pat 1 A ot 
W,,(t) = ——— | ——sin® “dn = ———_ | — sin’ d(-A,,, 2), 6.108 
= n’ J ee 2 h J (At) 2 ™ 
where p, is the density of the states n on the energy axis: 
(6.109) 


This density and the matrix element 4,,,, have to be evaluated at A,,, = 0, i.e. at energy E, = E,+ ha, 
and are assumed to be constant within the final state group. At fixed E,,, the function under integral 
(108) is even and decreases fast at | A, ¢ | >> 1 — see Fig. 10b. Hence we may introduce a dimensionless 
integration variable € = A,,,t, and extend the integration over it formally from —co to +oo, Then the 
integral in Eq. (108) is reduced to a table one,*° and yields 


A Awl Pit PL. Eo A Anmi| Pat 2 
yo Gj oe sin? 2 dé =m es * ery, 6.110 
(2) ; | 5 dg a (6.110) 
where the constant 
r=l4,,)'p, (6.111) 


is called the transition rate.*7 This is one of the most famous and useful results of quantum mechanics, 
its Golden Rule38, which deserves much discussion. 


First of all, let us reproduce the reasoning already used in Sec. 2.5 to show that the meaning of 
the rate is much deeper than Eq. (110) seems to imply. Indeed, due to the conservation of the total 
probability, W,,.+ Ws= 1, we can rewrite that equation as 

W. 


n' 


j=l. (6.112) 


t= 


Evidently, this result cannot be true for all times, otherwise the probability W,,, would become negative. 
The reason for this apparent contradiction is that Eq. (110) was obtained in the assumption that initially, 
the system was completely on level n’: W,,(0) = 1. Now, if at the initial moment the value of W,,’ is 
different, the result (110) has to be multiplied by that number, due to the linear relation (88) between 
da,/dt and a,. Hence, instead of Eq. (112) we get a differential equation similar to Eq. (2.159), 

W,, TW, 


n'? 


(6.113) 


120 — 


which, for a time-independent I, has the evident solution, 


36 See, e.g., MA Eq. (6.12). 

37In some texts, the density of states in Eq. (111) is replaced with a formal expression 2 SE, — E,,— ha). Indeed, 
applied to a finite energy interval AE,, with An >> 1 levels, it gives the same result: An = (dn/dE, AE, = P,AEp. 
Such replacement may be technically useful in some cases, but is incorrect for An ~ 1, and hence should be used 
with the utmost care, so that for most applications the more explicit form (111) is preferable. 

38 Sometimes Eq. (111) is called “Fermi’s Golden Rule”. This is rather unfair, because this result was developed 
mostly by the same P. A. M. Dirac in 1927, and Enrico Fermi’s role was not much more than advertising it, under 
the name of “Golden Rule No. 2”, in his influential lecture notes on nuclear physics, which were published much 
later, in 1950. (To be fair to Fermi, he has never tried to pose as the Golden Rule’s author.) 
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W..(t)=W,.(O)e!*, (6.114) 


describing the exponential decay of the initial state’s occupancy, with the time constant 7 = 1/T. 


I am inviting the reader to review this fascinating result again: by the summation of periodic 
oscillations (102) over many levels n, we have got an exponential decay (114) of the probability. This 
trick becomes possible because the effective range AE, of the state energies £,, giving substantial 
contributions to the integral (108), shrinks with time: AE, ~ h/t.39 However, since most of the decay 
(114) takes place within the time interval of the order of r = 1/T, the range of the participating final 


energies may be estimated as 
h 


AE, ~—=hT. (6.115) 
e 
This estimate is very instrumental for the formulation of conditions of the Golden Rule’s validity. First, 
we have assumed that the matrix elements of the perturbation and the density of states are independent 
of the energy within the interval (115). This gives the following requirement 


AE, ~ Wl <<E,—-E, ~ho, (6.116) 


Second, for the transfer from the sum (107) to the integral (108), we need the number of states within 
that energy interval, AN, = p,AE,, to be much larger than 1. Merging Eq. (116) with Eq. (92) for all the 
energy levels n” #n, n’ not participating in the resonant transition, we may summarize all conditions of 
the Golden Rule validity as 


p, <<hl <<hlota 


(6.117) 


(The reader may ask whether I have neglected the condition expressed by the first of Eqs. (106). 
However, for Ann’ ~ AE,/h ~T, this condition is just |Ann |’ << (AT)’, so that plugging it into Eq. (111), 


[ze = (ory Rs (6.118) 


and canceling one I and one h, we see that it coincides with the first relation in Eq. (117) above.) 


Let us have a look at whether these conditions may be satisfied in practice, at least in some 
cases. For example, let us consider the optical ionization of an atom, with the released electron confined 
in a volume of the order of 1 cm* = 10° m’. According to Eq. (1.90), with E' of the order of the atomic 
ionization energy EL, — E,, = h@ ~ 1 eV, the density of electron states in that volume is of the order of 
107! 1/eV, while the right-hand side of Eq. (117) is of the order of E, ~ 1 eV. Thus the conditions (117) 
provide an approximately 20-orders-of magnitude range for acceptable values of “I. This illustration 
should give the reader a taste of why the Golden Rule is applicable to so many situations. 


Finally, the physical picture of the initial state’s decay (which will also be the key to our 
discussion of quantum-mechanical “open” systems in the next chapter) is also very important. 
According to Eq. (114), the external excitation transfers the system into the continuous spectrum of 
levels n, and it never comes back to the initial level n’. However, it was derived from the quantum 
mechanics of Hamiltonian systems, whose equations are invariant with respect to time reversal. This 


39 This is one more appearance of the “energy-time uncertainty relation”, which was discussed in Sec. 2.5. 
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paradox is a result of our generalization (113) of the exact result (112) This trick, breaking the time- 
reversal symmetry, is absolutely adequate for the physics under study. Indeed, some gut feeling of the 
physical sense of this irreversibility may be obtained from the following observation. As Eq. (1.86) 
illustrates, the distance between the adjacent orbital energy levels tends to zero only if the system’s size 
goes to infinity. This means that our assumption of the continuous energy spectrum of the finial states n 
essentially requires these states to be broadly extended in space — being either free, or essentially free de 
Broglie waves. Thus the Golden Rule corresponds to the (physically justified) assumption that in an 
infinitely large system, the traveling de Broglie waves excited by a local source and propagating 
outward from it, would never come back, and even if they did, unpredictable phase shifts introduced by 
minor uncontrollable perturbations on their way would never allow them to sum up in the coherent way 
necessary to bring the system back into the initial state n’. (This is essentially the same situation which 
was discussed, for a particular 1D wave-mechanical system, in Sec. 2.5.)40 


To get a feeling of the Golden Rule at work, let us apply it to the following simple problem — 
which is a toy model of the photoelectric effect, briefly discussed in Sec. 1.1(1i). A 1D particle is 
initially trapped in the ground state of a narrow potential well, described by Eq. (2.158): 


U(x) =-w5(x), with w>0. (6.119) 


Let us calculate the rate I of the particle’s “ionization” (i.e. its excitation into a group of extended, 
delocalized states) by a weak classical sinusoidal force of amplitude Fo and frequency @, suddenly 
turned on at some instant, say ¢ = 0. 


As a reminder, the initial localized state (in our current notation, n’) of such a particle was 
already found in Sec. 2.6: 


w he? w 
ho Sith pe ae: (6.120) 
2m 2h 

The final, extended states n, with a continuous spectrum, for this problem exist only at energies E,, > 0, 

so that the excitation rate is different from zero only for frequencies 

Ey w* 

O> O,, = =, (6.121) 
h 2h 


The weak sinusoidal force may be described by the following perturbation Hamiltonian, 


VW, (x)= K'? exp |x 


ig EF sé 
H =—-F(t)& =—F,icos ot = ~=b (em +e - for 1>0, (6.122) 
so that according to Eq. (86), which serves as the amplitude operator’s definition, in this case 
nn F. 
= At = (6.123) 


The matrix elements A,,, that participate in Eq. (111) may be readily calculated in the coordinate 
representation: 


40 This situation is also similar to the irreversible increase of entropy of macroscopic systems, despite the fact that 
their microscopic components obey reversible laws of motion, which is postulated in thermodynamics and 
explained in statistical physics — see, e.g., SM Secs. 1.2 and 2.2. 
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Aw = [VEOACY de =-— fy Cory, Cdr. (6.124) 


Since, according to Eq. (120), the initial y,, is a symmetric function of x, non-vanishing contributions 
to this integral are given only by antisymmetric functions y,(x), proportional to sink,x, with the wave 
number k,, related to the final energy by the well-familiar equality (1.89): 


ake 


6.125 
on (6.125) 


As we know from Sec. 2.6 (see in particular Eq. (2.167) and its discussion), such antisymmetric 
functions, with y,,(0) = 0, are not affected by the zero-centered delta-functional potential (119), so that 
their density , is the same as that in completely free space, and we could use Eq. (1.100). However, 
since that relation was derived for traveling waves, it is more prudent to repeat its derivation for 
standing waves, confining them to an artificial segment [-//2, +//2] — long in the sense 


k,1,x >>1, (6.126) 


so it does not affect the initial localized state and the excitation process. Then the confinement 
requirement y,(+//2) = 0 immediately yields the condition k,//2 = nz, so that Eq. (1.100) is indeed valid, 
but only for positive values of k,, because sink,x with k, — —k, does not describe an independent 
standing-wave eigenstate. Hence the final state density is 


hh? 
pe dn _dn /dE, 1 i ky a | (6.127) 
dE, dk,/ dk, 2x/ m — 2ah’k, 


It may look troubling that the density of states depends on the artificial segment’s length /, but 
the same / also participates in the final wavefunctions’ normalization factor,*! 


2 1/2 
v.-(2| sink,,x, (6.128) 


and hence in the matrix element (124): 


1/2 41 Weer 1 
SF iad ~*( 2) [sin k,xe “xd. = fo 24) i mast | por g) . (6.129) 
q 0 0 


-l 


These two integrals may be readily worked out by parts. Taking into account that due to the condition 
(126), their upper limits may be extended to ~, the result is 


1/2 
Hoe (=) pe, (6.130) 
l (x +x?) 


Note that the matrix element is a smooth function of k, (and hence of E£,,), so that an important condition 


of the Golden Rule, the virtual constancy of A,,,,on the interval AE, ~ AT << E,, is satisfied. So, the 
general Eq. (111) is reduced, for our problem, to the following expression: 


41 The normalization to infinite volume, using Eq. (4.263), is also possible, but physically less transparent. 
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1/2 ? 2 3 
2k 8F mk 
os 7] Fo 2 x 2 = = ae “ 4°? (6.131) 
h l (ko +K°) | 2ahvk, he(ki +K~) 

which is independent of the artificially introduced /— thus justifying its use. 


Note that due to the above definitions of k, and «, the expression in the parentheses in the 
denominator of the last expression does not depend on the potential well’s “weight” #, and is a function 


of only the excitation frequency @ (and the particle’s mass): 


2 2 2 
n(x lm E,, =ha. (6.132) 
2m 
As aresult, Eq. (131) may be recast simply as 
F°-w 
Al = elRadl.Z (6.133) 
2(ha) 


What is hidden here is that k,, defined by Eq. (125) with E,, = E,, + ha, is a function of the external 
force’s frequency, changing as o'” at @ >> Omin (so that T drops as o"” at @—> 0), and as (@— @min) 
when @ approaches the “red boundary” (121) of the ionization effect, so that To (@— @nin)”? —> 0 in 


that limit as well. 


A conceptually very similar, but a bit more involved analysis of such effect in a more realistic 
3D case, namely the hydrogen atom’s ionization by an optical wave, is left for the reader’s exercise. 


6.7. Golden Rule for step-like perturbations 
Now let us reuse some of our results for a perturbation being turned on at ¢ = 0, but after that 
time-independent: 
for t <0, 


(6.134) 
H =const, for t2>0. 


K 0, 
HO -| : 


A superficial comparison of this equality and the former Eq. (86) seems to indicate that we may use all 


our previous results, taking @ = 0 and replacing A+ A’ with A. However, that conclusion (which 
would give us a wrong factor of 2 in the result) does not take into account the fact that analyzing both 
the two-level approximation in Sec. 5, and the Golden Rule in Sec. 6, we have dropped the second (non- 
resonant) term in Eq. (90). In our current case (134), with w = 0, there is no such difference between 
these terms. This why it is more prudent to use the general Eq. (84), 


®, t 


iha, = Ya,H,,e.°m", (6.135) 


in which the matrix element of the perturbation is now time-independent at ¢ > 0. We see that it is 
formally equivalent to Eq. (88) with only the first (resonant) term kept, if we make the following 
replacements: 

A>H, A,.=0-0,, > -0,,. (6.136) 


nn' 
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Let us use this equivalency to consider the results of coupling between a discrete-energy state n’, 
into which the particle is initially placed, and a dense group of states with a quasi-continuum spectrum, 
in the same energy range. Figure 11a shows an example of such a system: a particle is initially (say, at ¢ 
= 0) placed into a potential well separated by a penetrable potential barrier from a formally infinite 
region with a continuous energy spectrum. Let me hope that the physical discussion in the last section 
makes the outcome of such an experiment evident: the particle will gradually and irreversibly tunnel out 
of the well, so that the probability W,,() of its still residing in the well will decay in accordance with Eq. 
(114). The rate of this decay may be found by making the replacements (136) in Eq. (111): 


2 


r=—|H,,| p,. (6.137) 
h 
where the states 1 and n’ now have virtually the same energy.*? 
a 
Z ee. 
<> -- 
n' Tr n 


Fig. 6.11. Tunneling from a discrete-energy state n’: (a) to a 
state continuum, and (b) to another discrete-energy state n. 


It is very informative to compare this result, semi-quantitatively, with Eq. (105) for a symmetric 
(E, = En’) system of two potential wells separated by a similar potential barrier — see Fig. 11b. For the 
symmetric case, i.e. €= 0, Eq. (105) is reduced to simply 


Q= = [Ha (6.138) 


con 


Here I have used the index “con” (from “confinement”) to emphasize that this matrix element is 
somewhat different from the one participating in Eq. (137), even if the potential barriers are similar. 
Indeed, in the latter case, the matrix element, 


Hy» =(n|Aln') = [y,Hy,de, (6.139) 


has to be calculated for two wavefunctions y,, and y,, confined to spatial intervals of the same scale /con, 
while in Eq. (137), the wavefunctions y, are extended over a much larger distance / >> [on — see Fig. 
11. As Eq. (128) tells us, in the 1D model this means an additional small factor of the order of (conf). 
Now using Eq. (128) as a crude but suitable model for the final-state wavefunctions, we arrive at the 
following estimate, independent of the artificially introduced length /: 


con 


2 

2 
: an = (20) (6.140) 
on | Qah?k, AE, AE, 


> Lay tm Hn 


AY ~ 2H yy =p, ~ 27H py 


2 
con 


where AE, ~ hi’/ml’ so, is the scale of the differences between the eigenenergies of the particle in an 
unperturbed potential well. Since the condition of validity of Eq. (138) is hQ << AE,,’, we see that 


42 The condition of validity of Eq. (137) is again given by Eq. (117), just with w = 0 in the upper limit for T. 
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ar ~ Oho << ho. (6.141) 
AE 


n 


This (sufficiently general*?) perturbative result confirms the conclusion of a more particular 
analysis carried out in the end of Sec. 2.6: the rate of the (irreversible) quantum tunneling into a state 
continuum is always much lower than the frequency of (reversible) quantum oscillations between 
discrete states separated with the same potential barrier — at least for the case when both are much lower 
than AE,,/h, so that the perturbation theory is valid. A very handwaving interpretation of this result is 
that the particle oscillates between the confined state in the well and the space-extended states behind 
the barrier many times before finally “deciding to perform” an irreversible transition into the unconfined 
continuum. This qualitative picture is consistent with experimentally observable effects of dispersive 
electromagnetic environments on electron tunneling.*+ 


Let me conclude this section (and this chapter) with the application of Eq. (137) to a very 
important case, which will provide a smooth transition to the next chapter’s topic. Consider a composite 
system consisting of two component systems, a and b, with the energy spectra sketched in Fig. 12. 


system a system b 
n ny, 
. interaction jj Fig. 6.12. Energy relaxation in 
ae & ae Teme @ system a due to its weak coupling 
H™* = A(a)B(b) to system b (which serves as the 
n, n', environment of a). 


Let the systems be completely independent initially. The independence means that in the absence 
of their coupling, the total Hamiltonian of the system may be represented as a sum of two operators: 


H® =H (a)+4,(b), (6.142) 


where the arguments a and b symbolize the non-overlapping sets of the degrees of freedom of the two 
systems. Such operators, belonging to their individual, different Hilbert spaces, naturally commute. 
Similarly, the eigenkets of the system may be naturally factored as 


\n) = 


The direct product sign ® is used here (and below) to denote the formation of a joint ket-vector from the 
kets of the independent systems, belonging to different Hilbert spaces. Evidently, the order of operands 
in such a product may be changed at will. As a result, its eigenenergies separate into a sum, just as the 
Hamiltonian (142) does: 


n,)®|n,). (6.143) 


A|n) =(H, + ,)|n,) ®|n,) =(A,|n,))@|n,) + (A, |7,))@|n,) =(Eqe + Ep )|n). (6.144) 


43 It is straightforward to verify that the estimate (141) is valid for similar problems of any spatial dimensionality, 
not just for the 1D case we have analyzed. 
44 See, e.g., P. Delsing et al., Phys. Rev. Lett. 63, 1180 (1989). 
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In such composite systems, the relatively weak interaction of its components may be usually 
represented as a bilinear product of two Hermitian operators, each depending only on the degrees of 
freedom of one component system: 


H” = A(a)B(b). (6.145) 


A very common example of such an interaction is the electric-dipole interaction between an atomic- 
scale system (with a linear size of the order of the Bohr radius rg ~ 10°'° m) and the electromagnetic 
field at optical frequencies @ ~ 10'° s', with the wavelength 2 = 2ac/w~ 10° m>> rp: 45 


H=-4-€  withd=)q,f,, (6.146) 
k 


where the dipole electric moment d depends only on the positions r; of the charged particles (numbered 
with index k) of the atomic system, while that of electric field @ is a function of only the 
electromagnetic field’s degrees of freedom — to be discussed in Chapter 9 below. 


Returning to the general situation shown in Fig. 12, if the component system a was initially in an 
excited state n’,, the interaction (145), turned on at some moment of time, may bring it into another 
discrete state n, of a lower energy — for example, the ground state. In the process of this transition, the 
released energy, in the form of an energy quantum 


ho=E,,, -E, (6.147) 


is picked up by the system b: 


E, =E», tho=E»,+(Ey, —Eqg)s (6.148) 


nb n'b 


so that the total energy E = E, + E; of the system does not change. (If the states n, and n’, are the ground 
states of the two component systems, as they are in most applications of this analysis, and we take the 
ground state energy Ey = Eng + En» of the composite system for the reference, then Eq. (148) gives 
merely E,,, = Ena.) If the final state n, of the system b is inside a state group with a quasi-continuous 
energy spectrum (Fig. 12), the process has the exponential character (114)*° and may be interpreted as 
the effect of energy relaxation of the system a, with the released energy quantum fia absorbed by the 
system b. Note that since the quasi-continuous spectrum essentially requires a system of large spatial 
size, such a model is very convenient for description of the environment b of the quantum system a. (In 
physics, the “environment” typically means all the Universe — less the system under consideration.) 


If the relaxation rate T’ is sufficiently low, it may be described by the Golden Rule (137). Since 
the perturbation (145) does not depend on time explicitly, and the total energy E does not change, this 
relation, with the account of Eqs. (143) and (145), takes the form 


(6.149) 


where /, is the density of the final states of the system 5 at the relevant energy (147). In particular, Eq. 
(149), with the dipole Hamiltonian (146), enables a very straightforward calculation of the natural 
linewidth of atomic electric-dipole transitions. However, such calculation has to be postponed until 
Chapter 9, in which we will discuss the electromagnetic field quantization — i.e., the exact nature of the 


45 See, e.g., EM Sec. 3.1, in particular Eq. (3.16), in which letter p is used for the electric dipole moment. 
46 Such process is spontaneous: it does not require any external agent, and starts as soon as either the interaction 
(145) has been turned on, or (if it is always on) as soon as the system a is placed into the excited state n’,. 
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states n, and n», for this problem, and hence will be able to calculate B,,,, and p,. Instead, I will now 
proceed to a general discussion of the effects of quantum systems interaction with their environment, 
toward which the situation shown in Fig. 12 provides a clear conceptual path. 


6.8. Exercise problems 


6.1. Use Eq. (14) to prove the following general form of the Hellmann-Feynman theorem (whose 
proof in the wave-mechanics domain was the task of Problem 1.5): 


where J is an arbitrary c-number parameter. 
6.2. Establish a relation between Eq. (16) and the result of the classical theory of weakly 
anharmonic (“nonlinear’’) oscillations at negligible damping. 


Hint: You may like to use N. Bohr’s reasoning discussed in Problem 1.1. 


6.3. A weak, time-independent force F' is exerted on a 1D particle that was placed into a hard- 


wall potential well 
0, for 0<x<a, 
U(x)= 


+ 00, otherwise. 


Calculate, sketch, and discuss the 1‘'-order perturbation of its ground-state wavefunction. 


6.4. A time-independent force F = 4(n,y+n,x), where w is a small constant, is applied to a 3D 
harmonic oscillator of mass m and frequency @p, located at the origin. Calculate, in the first order of the 
perturbation theory, the effect of the force upon the ground state energy of the oscillator, and its lowest 
excited energy level. How small should the constant «2 be for your results to be quantitatively correct? 


6.5. A 1D particle of mass m is localized at a narrow potential well that may be approximated 
with a delta function: 
U(x)=-weé(x), — with W>0. 


Calculate the change of its ground state energy by an additional weak, time-independent force F, in the 
first non-vanishing approximation of the perturbation theory. Discuss the limits of validity of this result, 
taking into account that at F’ + 0, the localized state of the particle is metastable. 


6.6. Use the perturbation theory to calculate the eigenvalues of the operator L? in the limit |m| = 1 
>> 1, by purely wave-mechanical means. 


Hint: Try the following substitution: @(@) = (@/sin''” 6. 
6.7. In the lowest non-vanishing order of the perturbation theory, calculate the shift of the 


ground-state energy of an electrically charged spherical rotator (i.e. a particle of mass mm, free to move 
over a spherical surface of radius R) due to a weak, uniform, time-independent electric field &. 
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6.8. Use the perturbation theory to evaluate the effect of a time-independent, uniform electric 
field & on the ground state energy E, of a hydrogen atom. In particular: 


(i) calculate the 2"“-order shift of Ex, neglecting the extended unperturbed states with E > 0, and 
bring the result to the simplest analytical form you can, 

(ii) find the lower and the upper bounds on the shift, and 

(111) discuss the simplest experimental manifestations of this guadratic Stark effect. 


6.9. A particle of mass m, with electric charge q, is in its ground s-state with a given energy E, < 
0, being localized by a very short-range, spherically-symmetric potential well. Calculate its static 
electric polarizability a. 


6.10. In some atoms, the charge-screening effect of other electrons on the motion of each of them 
may be reasonably well approximated by the replacement of the Coulomb potential (3.190), U = —Cir, 
with the so-called Hulthén potential 


Cla \/r, for r <<a, 
U= >—-Cx 
exp{r/a}—1 exp{-_r/a}/a, for a<<r. 


Assuming that the effective screening radius a is much larger than rp = fh’/mC, use the perturbation 
theory to calculate the energy spectrum of a single particle of mass », moving in this potential, in the 
lowest order needed to lift the /-degeneracy of the levels. 


6.11. In the lowest non-vanishing order of the perturbation theory, calculate the correction to 
energies of the ground state and all lowest excited states of a hydrogen-like atom/ion, due to electron’s 
penetration into its nucleus, modeling it as a spinless, uniformly charged sphere of radius R << rp/Z. 


6.12. Prove that the kinetic-relativistic correction operator (48) indeed has only diagonal matrix 
elements in the basis of unperturbed Bohr atom states (3.200). 


6.13. Calculate the lowest-order relativistic correction to the ground-state energy of a 1D 
harmonic oscillator. 


6.14. Use the perturbation theory to calculate the contribution to the magnetic susceptibility 7m 
of a dilute gas, that is due to the orbital motion of a single electron inside each gas particle. Spell out 
your result for a spherically-symmetric ground state of the electron, and give am estimate of the 
magnitude of this orbital susceptibility. 


6.15. How to calculate the energy level degeneracy lifting, by a time-independent perturbation, 


in the 2™ order of the perturbation in A, assuming that it is not lifted in the 1“ order? Carry out such 
calculation for a plane rotator of mass s and radius R, carrying electric charge g, and placed into a 
weak, uniform, constant electric field & 


6.16. The Hamiltonian of a quantum system is slowly changed in time. 


(i) Develop a theory of quantum transitions in the system, and spell out its result in the 1“ order 
in the speed of the change. 
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(ii) Use the 1*-order result to calculate the probability that a finite-time pulse of a slowly 
changing force F(t) drives a 1D harmonic oscillator, initially in its ground state, into an excited state. 
(111) Compare the last result with the exact one. 


6.17. Use the single-particle model to calculate the complex electric permittivity &@) of a dilute 
gas of similar atoms, due to their induced electric polarization by a weak external ac field, for a field 
frequency @ very close to one of quantum transition frequencies @,». Based on the result, calculate and 
estimate the absorption cross-section of each atom. 


Hint: In the single-particle model, atom’s properties are determined by Z similar, non-interacting 
electrons, each moving in a similar static attracting potential, generally different from the Coulomb one, 
because it is contributed not only by the nucleus, but also by other electrons. 


6.18. Use the solution of the previous problem to generalize the expression for the London 


dispersion force between two atoms (whose calculation in the harmonic-oscillator model was the subject 
of Problems 3.16 and 5.15) to the single-particle model with an arbitrary energy spectrum. 


6.19. Use the solution of the previous problem to calculate the potential energy of interaction of 
two hydrogen atoms, both in their ground state, separated by distance r >> rp. 


energy levels are slightly different — see the figure on the right (| €| << @, 2). 
Assuming that the involved matrix elements of the perturbation Hamiltonian 
are known, and are all proportional to the external ac field’s amplitude, find 
the time necessary to populate the first excited level almost completely (with a 
given precision € << 1), using the Rabi oscillation effect, ifatt=0 the system £, 
is completely in its ground state. 


6.20. In a certain quantum system, distances between the three lowest E£, 


6.21." Analyze the possibility of a slow transfer of a system from one of £.——_— 
its energy levels to another one (in the figure on the right, from level 1 to level He Nap As h Oo. 
3), using the scheme shown in that figure, in which the monochromatic external af : E 
excitation amplitudes A+ and A_ may be slowly changed at will. E, : 


6.22. A weak external force pulse F(t), of a finite time duration, is applied to a 1D harmonic 
oscillator that initially was in its ground state. 


(i) Calculate, in the lowest non-vanishing order of the perturbation theory, the probability that 
the pulse drives the oscillator into its lowest excited state. 

(11) Compare the result with the exact solution of the problem. 

(iii) Spell out the perturbative result for a Gaussian-shaped waveform, 


F(t)=F, expr? /r?}, 


and analyze its dependence on the scale z of the pulse duration. 


6.23. A spatially-uniform, but time-dependent external electric field &(4) is applied, starting from 
t = 0, to a charged plane rotator, initially in its ground state. 
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(i) Calculate, in the lowest non-vanishing order in the field’s strength, the probability that by 
time t > 0, the rotator is in its n" excited state. 

(11) Spell out and analyze your results for a constant-magnitude field rotating, with a constant 
angular velocity @, within the rotator’s plane. 

(iii) Do the same for a monochromatic field of frequency @, with a fixed polarization. 


6.24. A spin-’2 with a gyromagnetic ratio vis placed into a magnetic field including a time- 
independent component Zp, and a perpendicular field of a constant magnitude 4%, rotated with a 


constant angular velocity w. Can this magnetic resonance problem be reduced to one already discussed 
in Chapter 6? 


6.25. Develop general theory of quantum excitations of the higher levels of a discrete-spectrum 
system, initially in the ground state, by a weak time-dependent perturbation, up to the 2™ order. Spell 
out and discuss the result for the case of monochromatic excitation, with a nearly perfect tuning of its 


frequency w to the half of a certain quantum transition frequency @,0 = (En — Eo)/ h. 


6.26. A heavy, relativistic particle, with electric charge gq = Ze, passes by a hydrogen atom, 
initially in its ground state, with an impact parameter b within the range rg << b << rp/a, where a ~ 
1/137 is the fine structure constant. Calculate the probabilities of the atom’s transition to its lowest 
excited states. 


6.27. A particle of mass m is initially in the localized ground state, with energy E, < 0, of a very 
small, spherically-symmetric potential well. Calculate the rate of its delocalization by an applied 
classical force F(t) = nFocos@t with a time-independent direction n. 


6.28." Calculate the rate of ionization of a hydrogen atom, initially in its ground state, by a 
classical, linearly polarized electromagnetic wave with an electric field’s amplitude ¢, and a frequency 
@ within the range 

him,rg << @<<cl/Tg, 


where 7g is the Bohr radius. Recast your result in terms of the cross-section of electromagnetic wave 
absorption. Discuss briefly what changes of the theory would be necessary if either of the above 
conditions had been violated. 


6.29." Use the quantum-mechanical Golden Rule to derive the general expression for the electric 
current / through a weak tunnel junction between two conductors, biased with dc voltage V, treating the 
conductors as degenerate Fermi gases of electrons with negligible direct interaction. Simplify the result 
in the low-voltage limit. 


Hint: The electric current flowing through a weak tunnel junction is so low that it does not 
substantially perturb the electron states inside each conductor. 


6.30. Generalize the result of the previous problem to the case when a weak tunnel junction is 
biased with voltage V(t) = Vo + Acosa@t, with h@ generally comparable with eVo and eA. 


6.31. Use the quantum-mechanical Golden Rule to derive the Landau-Zener formula (2.257). 
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Chapter 7. Open Quantum Systems 


This chapter discusses the effects of a weak interaction of a quantum system with its environment. Some 

part of this material is on the fine line between quantum mechanics and (quantum) statistical physics. 

Here I will only cover those aspects of the latter field! that are of key importance for the major goals of 
this course, including the discussion of quantum measurements in Chapter 10. 


7.1. Open systems, and the density matrix 


All the way until the last part of the previous chapter, we have discussed quantum systems 
isolated from their environment. Indeed, from the very beginning, we have assumed that we are dealing 
with the statistical ensembles of systems as similar to each other as only allowed by the laws of quantum 
mechanics. Each member of such an ensemble, called pure or coherent, may be described by the same 
state vector |@) — in the wave mechanics case, by the same wavefunction ‘¥,. Even the discussion at the 
end of the last chapter, in which one component system (in Fig. 6.13, system b) may be used as a model 
of the environment of its counterpart (system a), was still based on the assumption of a pure initial state 
(6.143) of the composite system. If the interaction of the two components of such a system is described 
by a certain Hamiltonian (the one given by Eq. (6.145) for example), and the energy spectrum of each 
component system is discrete, for state a of the composite system at an arbitrary instant we may write 


|) =S2a,|n) => oa,|n,)®|n,), (7.1) 


with a unique correspondence between the eigenstates n, and np. 


However, in many important cases, our knowledge of a quantum system’s state is even less 
complete.2 These cases fall into two categories. The first case is when a relatively simple quantum 
system s of our interest (say, an electron or an atom) is in a weak? but substantial contact with its 
environment e — here understood in the most general sense, say, as all the whole Universe less system s 
— see Fig. 1. Then there is virtually no chance of making two or more experiments with exactly the same 
composite system because that would imply a repeated preparation of the whole environment (including 
the experimenter :-) in a certain quantum state — a rather challenging task, to put it mildly. Then it makes 
much more sense to consider a statistical ensemble of another kind — a mixed ensemble, with random 
states of the environment, though possibly with its macroscopic parameters (e.g., temperature, pressure, 
etc.) known with high precision. Such ensembles will be the focus of the analysis in this chapter. 


Much of this analysis will pertain also to another category of cases — when the system of our 
interest is isolated from its environment, at present, with acceptable precision, but our knowledge of its 
state is still incomplete for some other reason. Most typically, the system could be in contact with its 


! A broader discussion of statistical mechanics and physical kinetics, including those of quantum systems, may be 
found in the SM part of this series. 

2 Indeed, a system, possibly apart from our Universe as a whole (who knows? — see below), is never exactly 
coherent, though in many cases, such as the ones discussed in the previous chapters, deviations from the 
coherence may be ignored with acceptable accuracy. 

3 If the interaction between a system and its environment is very strong, their very partition is impossible. 
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environment at earlier times, and its reduction to a pure state is impracticable. So, this second category 
of cases may be considered as a particular case of the first one, and may be described by the results of its 
analysis, with certain simplifications — which will be spelled out in appropriate places of my narrative. 


weak 


interaction The Universe 


system of 
interest (s) 


environment (e) 


Fig. 7.1. A quantum system and its environment 
(VERY schematically :-). 


In classical physics, the analysis of mixed statistical ensembles is based on the notion of the 
probability W (or the probability density w) of each detailed (“microscopic”) state of the system of 
interest.4 Let us see how such an ensemble may be described in quantum mechanics. In the case when 
the coupling between the system of our interest and its environment is so weak that they may be clearly 
separated, we can still use state vectors of their states, defined in completely different Hilbert spaces. 
Then the most general quantum state of the whole Universe, still assumed to be pure,> may be described 
as the following linear superposition: 


Universe: 


(7.2) quantum 


state 


The “only” difference of such a state from the superposition described by Eq. (1), is that there is 
no one-to-one correspondence between the states of our system and its environment. In other words, a 
certain quantum state s; of the system of interest may coexist with different states e, of its environment. 
This is exactly the quantum-mechanical description of a mixed state of the system s. 


Of course, the huge size of the Hilbert space of the environment, i.e. of the number of the |e;) 
factors in the superposition (2), strips us of any practical opportunity to make direct calculations using 
that sum. For example, according to the basic Eq. (4.125), to find the expectation value of an arbitrary 
observable A in the state (2), we would need to calculate the long bracket 


(4)=(adla)= Yaar (6y|@(s |dfs,) ler) 3) 


Even if we assume that each of the sets ts} and {e} is full and orthonormal, Eq. (3) still includes a 
double sum over the enormous basis state set of the environment! 


However, let us consider a limited, but the most important subset of operators — those of intrinsic 
observables, which depend only on the degrees of freedom of the system of our interest (s). These 
operators do not act upon the environment’s degrees of freedom, and hence in Eq. (3), we may move the 
environment’s bra-vectors (e;| over all the way to the ket-vectors |e,’). Assuming, again, that the set of 
environmental eigenstates is full and orthonormal, Eq. (3) is now reduced to 


4 See, e.g., SM Sec. 2.1. 
5 Whether this assumption is true is an interesting issue, still being debated (more by philosophers than by 
physicists), but it is widely believed that its solution is not critical for the validity of the results of this approach. 
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A) = YA ;x-(8;|4 s, ex lev) ee re (7.4) 


BG GKK 


This is already a big relief because we have “only” a single sum over k, but the main trick is still 
ahead. After the summation over k, the second sum in the last form of Eq. (4) is some function w of the 
indices j andj’, so that, according to Eq. (4.96), this relation may be represented as 


(A) = di Ww. =Tr(Aw), (7.5) 


ii' 


where the matrix w, with the elements 


(7.6) 


is called the density matrix of the system.® Most importantly, Eq. (5) shows that the knowledge of this 
matrix allows the calculation of the expectation value of any intrinsic observable A (and, according to 
the general Eqs. (1.33)-(1.34), its r.m.s. fluctuation as well, if needed), even for the very general state 
(2). This is why let us have a good look at the density matrix. 


First of all, we know from the general discussion in Chapter 4, fully applicable to the pure state 
(2), the expansion coefficients in superpositions of this type may be always expressed as short brackets 
of the type (4.40); in our current case, we may write 


a, =((e,|®(s,|)a). (7.7) 
Plugging this expression into Eq. (6), we get 


p= Dana =(6) 3 tela: ) @|s,.)=(s,)ra]s, (78) 


We see that from the point of our system (i.e. in its Hilbert space whose basis states may be numbered 
by the index 7 only), the density matrix is indeed just the matrix of some construct,’ 


(7.9) 


which is called the density (or “statistical’”’) operator. As it follows from the definition (9), in contrast to 
the density matrix this operator does not depend on the choice of a particular basis s; — just as all linear 
operators considered earlier in this course. However, in contrast to them, the density operator does 
depend on the composite system’s state @, including the state of the system s as well. Still, in the /-space 
it is mathematically just an operator whose matrix elements obey all relations of the bra-ket formalism. 


In particular, due to its definition (6), the density operator is Hermitian: 


= Lente = Lents a i = (7.10) 


6 This notion was suggested in 1927 by John von Neumann. 
7 Note that the “short brackets” in this expression are not c-numbers, because the state @ is defined in a larger 
Hilbert space (of the environment plus the system of interest) than the basis states e,; (of the environment only). 
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so that according to the general analysis of Sec. 4.3, in the Hilbert space of the system s, there should be 
a certain basis {w} in that the matrix of this operator is diagonal: 


= WS yp: (7.11) 


W iy 


inw 


Since any operator, in any basis, may be represented in the form (4.59), in the basis {w} we may write 


Statistical 


age (7.12) ical 
This expression reminds, but is not equivalent to Eq. (4.44) for the identity operator, that has been used 
so many times in this course, and in the basis w; has the form 
T=>)|w, \w,|. (7.13) 
j 
In order to comprehend the meaning of the coefficients w; participating in Eq. (12), let us use Eq. 
(5) to calculate the expectation value of any observable A whose eigenstates coincide with those of the 
special basis {w}, and whose matrix is, therefore, diagonal in this basis: 
Expectation 
lue of 
(A) =Tr(Aw) = 5 4,,.w,d, => 4, 5 (7.14) wpcompatibte 
7 F; variable 
where A; is just the expectation value of the observable A in the state w; Hence, to comply with the 
general Eq. (1.37), the real c-number w; must have the physical sense of the probability W; of finding the 
system in the state 7. As the result, we may rewrite Eq. (12) in the form 
w= > |w,)W,(w, |. (7.15) 
7 
In the ultimate case when only one of the probabilities (say, W;~) is different from zero, 
W, = 0 in, (7.16) 
the system is in a coherent (pure) state w;”. Indeed, it is fully described by one ket-vector |w;~), and we 
can use the general rule (4.86) to represent it in another (arbitrary) basis {s} as a coherent superposition 
= t = ‘i 
|) = Uh $;.)= Ui 5) (7.17) 
J J 
where U is the unitary matrix of transform from the basis {w} to the basis {s}. According to Eqs. (11) 
and (16), in such a pure state the density matrix is diagonal in the {w} basis, 
W iilin w=, jj" 9 (7.18a) 
but not in an arbitrary basis. Indeed, using the general rule (4.92), we get 
* 
Wyling = Up Weliny yp = OU py =U id jo (7.18b) 
il 
To make this result more transparent, let us denote the matrix elements Uj»; = (wj”|s;) (which, for 
a fixed 7”, depend on just one index /) by aj; then 
* Density 
j'jins — Oj @ js (7.19) matrix: 


pure state 
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so that N° elements of the whole NxN matrix is determined by just one string of N c-numbers ay. For 


example, for a two-level system (N = 2), 
*k * 
| AA, A,& 
ins * * | 
QA, A,a, 


WwW 


(7.20) 


We see that the off-diagonal terms are, colloquially, “as large as the diagonal ones’, in the following 
sense: 
WoW) = Wi W)- (7.21) 


Since the diagonal terms have the sense of the probabilities W,2 to find the system in the corresponding 
state, we may represent Eq. (20) in the form 


pure state ~ 


W, (WW,) 7 e'? . 
(WW)! e~ 1p W, 


” (7.22) 


The physical sense of the (real) constant g is the phase shift between the coefficients in the linear 
superposition (17), which represents the pure state w;» in the basis {s,2}. 


Now let us consider a different statistical ensemble of two-level systems, that includes the 
member states identical in all aspects (including similar probabilities W,2 in the same basis 51,2), besides 
that the phase shifts @ are random, with the phase probability uniformly distributed over the 
trigonometric circle. Then the ensemble averaging is equivalent to the averaging over gy from 0 to 27,8 
which kills the off-diagonal terms of the density matrix (22), so that the matrix becomes diagonal: 


av (7.23) 
classical mixture — 0 W, . ‘ 


WwW 


The mixed statistical ensemble with the density matrix diagonal in the stationary state basis is called the 
classical mixture and represents the limit opposite to the pure (coherent) state. 


After this example, the reader should not be much shocked by the main claim? of statistical 
mechanics that any large ensemble of similar systems in thermodynamic (or “thermal”) equilibrium is 
exactly such a classical mixture. Moreover, for systems in the thermal equilibrium with a much larger 
environment of a fixed temperature T (such an environment is usually called a heat bath) the statistical 
physics gives a very simple expression, called the Gibbs distribution, for the probabilities W,,:!° 


(7.24) 


8 For a system with a time-independent Hamiltonian, such averaging is especially plausible in the basis of the 
stationary states n of the system, in which the phase q is just the difference of integration constants in Eq. (4.158), 
and its randomness may be naturally produced by minor fluctuations of the energy difference E, — E>. In Sec. 3 
below, we will study the dynamics of this dephasing process. 

9 This fact follows from the basic postulate of statistical physics, called the microcanonical distribution — see, 
e.g., SM Sec. 2.2. 

10 See. e.g., SM Sec. 2.4. The Boltzmann constant kg is only needed if the temperature is measured in non-energy 
units — say in kelvins. 
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where E,, is the eigenenergy of the corresponding stationary state, and the normalization coefficient Z is 
called the statistical sum. 


A detailed analysis of classical and quantum ensembles in thermodynamic equilibrium is a major 
focus of statistical physics courses (such as the SM of this series) rather than this course of quantum 
mechanics. However, I would still like to attract the reader’s attention to the key fact that, in contrast 
with the similarly-looking Boltzmann distribution for single particles,'! the Gibbs distribution is 
general, not limited to classical statistics. In particular, for a quantum gas of indistinguishable particles, 
it is absolutely compatible with the quantum statistics (such as the Bose-Einstein or Fermi-Dirac 
distributions) of the component particles. For example, if we use Eq. (24) to calculate the average 
energy of a 1D harmonic oscillator of frequency @ in thermal equilibrium, we easily get!2 


W, =exp jee 1—exp ADs ,  Z2=exp nes l—exp ne . (7.25) 
kT kT 2k ,T kT 


2 h h h 
(e\=3'W,£, =" coth 1% = 2% 4 ae (7.26a) 
= 2 2k,T 2  exp{ha,/k,T}-1 
The final form of the last result, 
0, for k,T <<ho,, 
(E) = ay + ho, (n), with (n) = : > anki: ai (7.26b) 
2 exp{ha,/k,T}-1 — |kgT/ha,, for ha, << kT, 


may be interpreted as an addition, to the ground-state energy H@/2, of the average number (7) of 
thermally-induced excitations, with the energy /@ each. In the harmonic oscillator, whose energy levels 
are equidistant, such a language is completely appropriate, because the transfer of the system from any 
level to the one just above it adds the same amount of energy, #@. Note that the above expression for 
(n) is actually the Bose-Einstein distribution (for the particular case of zero chemical potential); we see 
that it does not contradict the Gibbs distribution (24) of the total energy of the system, but rather 
immediately follows from it. 

Because of the fundamental importance of Eq. (26) for virtually all fields of physics, let me draw 
the reader’s attention to its main properties. At low temperatures, kgT << h@p, there are virtually no 
excitations, (7) — 0, and the average energy of the oscillator is dominated by that of its ground state. In 
the opposite limit of high temperatures, (1) > kgT /h@>> 1, and (E) approaches the classical value kpT. 


7.2. Coordinate representation, and the Wigner function 


For many applications of the density operator, its coordinate representation is convenient. (I will 
only discuss it for the 1D case; the generalization to multi-dimensional cases is straightforward.) 
Following Eq. (4.47), it is natural to define the following function of two arguments (traditionally, also 
called the density matrix): 


'l See, e.g., SM Sec. 2.8. 

12 See, e.g., SM Sec. 2.5 — but mind a different energy reference level, Ey = fia/2, used for example in SM Eqs. 
(2.68)-(2.69), affecting the expression for Z. Actually, the calculation, using Eqs. (24) and (5.86), is so 
straightforward that it is highly recommended to the reader as a simple exercise. 
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w(x, x’) = (x|w x) . (7.27) 


Inserting, into the right-hand side of this definition, two closure conditions (4.44) for an arbitrary (but 
full and orthonormal) basis {s}, and then using Eq. (4.233),!3 we get 


w(x, x') = Llss)(s, a]s;)(s; |") = Lv wy 
JoJ Jed 
In the special basis {w}, in which the density matrix is diagonal, this expression is reduced to 


wxx')= iy, (Wy, (x'). (7.29) 


ne): (7.28) 


Let us discuss the properties of this function. At coinciding arguments, x’ = x, this is just the 
probability density:!4 


wOxx) = Dy, Wy; (2) = Dw, CW, = w(x). (7.30) 


However, the density matrix gives more information about the system than just the probability density. 
As the simplest example, let us consider a pure quantum state, with W; = 6);, so that y(x) = y; (x), and 


W(X) = OY (x) SY (2). (7.31) 


We see that the density matrix carries the information not only about the modulus but also the phase of 
the wavefunction. (Of course one may argue rather convincingly that in this ultimate limit the density- 
matrix description is redundant because all this information is contained in the wavefunction itself.) 


How may be the density matrix interpreted? In the simple case (31), we can write 


w(x) = wax" Ox) =v OW” OWE W(x) = WOW’), (7.32) 


so that the modulus squared of the density matrix is just as the joint probability density to find the 
system at the point x and the point x’. For example, for a simple wave packet with a spatial extent dx, 
w(x,x’) has an appreciable magnitude only if both points are not farther than ~éx from the packet center, 
and hence from each other. The interpretation becomes more complex if we deal with an incoherent 
mixture of several wavefunctions, for example, the classical mixture describing the thermodynamic 
equilibrium. In this case, we can use Eq. (24) to rewrite Eq. (29) as follows: 


r r | Es . r 
WO) = DV, OMY (®) =F DW) ex ; | (x'). (7.33) 
n n B 
As the simplest example, let us see what is the density matrix of a free (1D) particle in the 
thermal equilibrium. As we know very well by now, in this case, the set of energies E, = p'l2m of 
stationary states (monochromatic waves) forms a continuum, so that we need to replace the sum (33) 
with an integral, using for example the “delta-normalized” traveling-wave eigenfunctions (4.264): 


13 For now, I will focus on a fixed time instant (say, t= 0), and hence write y(x) instead of ‘P(x, 2). 
!4 This fact is the origin of the density matrix’s name. 
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1 ipx Pp ipx' 
, x’) =—— le e e dp. 7.34 
MO Ta sp, ji | va a Re ey 


This is a usual Gaussian integral, and may be worked out, as we have done repeatedly in Chapter 2 and 
beyond, by complementing the exponent to the full square of the momentum p plus a constant. The 
statistical sum Z may be also readily calculated, !5 


Z =(2mmk,T)"”, (7.35) 


However, for what follows it is more useful to write the result for the product wZ (the so-called un- 


normalized density matrix): 
kT \” k,.T(x—x'? 
wosx)Z =( 2 B ) es alt AE) } (7.36) 


2h? 


This is a very interesting result: the density matrix depends only on the difference of its 
arguments, dropping to zero fast as the distance between the points x and x’ exceeds the following 
characteristic scale (called the correlation length) 


(7.37) 


This length may be interpreted in the following way. It is straightforward to use Eq. (24) to verify that 
the average energy (E) = (p’/2m) of a free particle in the thermal equilibrium, i.e. in the classical mixture 
(33), equals kp7/2. Hence the average magnitude of the particle’s momentum may be estimated as 


pe=(p") = (m(E))? =(mkgr)”, (7.38) 
so that x, is of the order of the minimal length allowed by the Heisenberg-like “uncertainty relation”: 
X= A/D. (7.39) 


Note that with the growth of temperature, the correlation length (37) goes to zero, and the 
density matrix (36) tends to a delta function: 


w(x, x")Z| T50  O(X-X'). (7.40) 
Since in this limit the average kinetic energy of the particle is not smaller than its potential energy in any 
fixed potential profile, Eq. (40) is the general property of the density matrix (33). 


Let us discuss the following curious feature of Eq. (36): if we replace kg7 with fi/i(t — to), and x’ 
with xo, the un-normalized density matrix wZ for a free particle turns into the particle’s propagator — cf. 
Eq. (2.49). This is not just an occasional coincidence. Indeed, in Chapter 2 we saw that the propagator of 
a system with an arbitrary stationary Hamiltonian may be expressed via the stationary eigenfunctions as 


!5 Due to the delta-normalization of the eigenfunction, the density matrix (34) for the free particle (and any system 
with a continuous eigenvalue spectrum) is normalized as 


fotaxy Zar’ = foxx. Zax = ly 


0 0 
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E, * 
G(x,t5Xo.ty) = vy, (x) exp h (¢ ~ Lo Iv (x) : (7.41) 
Comparing this expression with Eq. (33), we see that the replacements 
i(t —t¢ 
a oe ee 2 (7.42) 
h kal 


turn the pure-state propagator G into the un-normalized density matrix wZ of the same system in 
thermodynamic equilibrium. This important fact, rooted in the formal similarity of the Gibbs distribution 
(24) with the Schrédinger equation’s solution (1.69), enables a theoretical technique of the so-called 
thermodynamic Green’s functions, which is especially productive in condensed matter physics. !® 


For our current purposes, we can employ Eq. (42) to re-use some of the wave mechanics results, 
in particular, the following formula for the harmonic oscillator’s propagator 


MO, = - MO, [(x? +Xx5 Jcosla, (t= t,)]—2x%9 | 
Qnihsinfa,(t—t,)1) 2ihsin[w, (t—1,)] 


G(x,t:X55t) -| |. (7.43) 


which may be readily proved to satisfy the Schrédinger equation for the Hamiltonian (5.62), with the 
appropriate initial condition: G(x, to; Xo, to) = Ax — xo). Making the substitution (42), we immediately get 


1/2 
jeez = MQ, Sp MO, [(x? +x" Jeosh{hay /k,T)—2xx'| ~ |.44) 
2mh sinh(ha, /k,T) 2h sinh(ha, /k,T) 


As a sanity check, at very low temperatures, kgT << hi@p, both hyperbolic functions participating in this 
expression are very large and nearly equal, and it yields 


1/4 2 1/4 2 
: mo MO)xX ho, mo M@)x' 
w(x, x )Z| T>0 > ( 4) ex nour x ex} a 4) ex moe" . (FAS) 


In each of the expressions in square brackets we can readily recognize the ground state’s wavefunction 
(2.275) of the oscillator, while the middle exponent is just the statistical sum (24) in the low-temperature 
limit when it is dominated by the ground-level contribution: 


ho 
Z\p+0 > es) oE.T } (7.46) 


As aresult, Z in both parts of Eq. (45) may be canceled, and the density matrix in this limit is described 
by Eq. (31), with the ground state as the only state of the system. This is natural when the temperature is 
too low for the thermal excitation of any other state. 


16 [ will have no time to discuss this technique and have to refer the interested reader to special literature. 
Probably, the most famous text of that field is A. Abrikosov, L. Gor’kov, and I. Dzyaloshinski, Methods of 
Quantum Field Theory in Statistical Physics, Prentice-Hall, 1963. (Later reprintings are available from Dover.) 
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Returning to arbitrary temperatures, Eq. (44) in coinciding arguments gives the following 
expression for the probability density:!7 


w(x,x)Z = w(x)Z -| mo ; es) Oo" tanh nes | (7.47) 


2mh sinh(ha, /k,T h 2k,T 


This is just a Gaussian function of x, with the following variance: 


h 
(x?) = Se ae (7.48) 
2mQ, 2k,T 


To compare this result with our earlier ones, it is useful to recast it as 


2 

(O=2 )= BOs ig, (7.49) 
2 4 2h 

Comparing this expression with Eq. (26), we see that the average value of potential energy is exactly 

one-half of the total energy — the other half being the average kinetic energy. This is what we could 

expect, because according to Eqs. (5.96)-(5.97), such relation holds for each Fock state and hence 

should also hold for their classical mixture. 


Unfortunately, besides the trivial case (30) of coinciding arguments, it is hard to give a 
straightforward interpretation of the density function in terms of the system’s measurements. This is a 
fundamental difficulty, which has been well explored in terms of the Wigner function (sometimes called 
the “Wigner-Ville distribution’’)!® defined as 

Wigner 
(7.50) function: 


definition 


From the mathematical standpoint, this is just the Fourier transform of the density matrix in one of two 
new coordinates defined by the following relations (see Fig. 2): 
Xt+x' xX : X 


X= , AX=x-x', So ater x'=X o (7.51) 


Physically, the new argument X may be interpreted as the average position of the particle during 
the time interval (t — t’), while X , as the distance passed by it during that time interval, so that P 
characterizes the momentum of the particle during that motion. As a result, the Wigner function is a 
mathematical construct intended to characterize the system’s probability distribution simultaneously in 
the coordinate and the momentum space — for 1D systems, on the phase plane LX, P], which we had 
discussed earlier — see Fig. 5.8. Let us see how fruitful this intention is. 


17 T have to confess that this notation is imperfect, because strictly speaking, w(x, x’) and w(x) are different 
functions, and so are the functions w(p, p’) and w(p) used below. In the perfect world, I would use different letters 
for them all, but I desperately want to stay with “w’ for all the probability densities, and there are not so many 
good fonts for this letter. Let me hope that the difference between these functions is clear from their arguments 
and the context. 

18 Tt was introduced in 1932 by Eugene Wigner on the basis of a general (Weyl-Wigner) transform suggested by 
Hermann Weyl in 1927 and re-derived in 1948 by Jean Ville on a different mathematical basis. 
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Fig. 7.2. The coordinates X and X employed in the Weyl- 
Wigner transform (50). They differ from the coordinates 
obtained by the rotation of the reference frame by the angle 
m4 only by factors V2 and 1/2, describing scale stretch. 


First of all, we may write the Fourier transform reciprocal to Eq. (50): 
uf + =, va <). [wX%P) on af je. (7.52) 
For the particular case X =0, this relation yields 


w(X) = w(X,X) = | W(X,P)aP. (7.53) 


Hence the integral of the Wigner function over the momentum P gives the probability density to find the 
system at point X — just as it does for a classical distribution function we(X, P).!9 


Next, the Wigner function has the similar property for integration over X. To prove this fact, we 
may first introduce the momentum representation of the density matrix, in full analogy with its 
coordinate representation (27): 


w(p, p') = (p| |p’). (7.54) 


Inserting, as usual, two identity operators, in the form given by Eq. (4.252), into the right-hand side of 
this equality, we get the following relation between the momentum and coordinate representations: 


wp, p') = [ J evade’ (p|x)(x x'\(x"| p') = 5 J faa exp|- Ba, 3)expy ve . (LSS) 


This is of course nothing else than the unitary transform of an operator from the x-basis to the p-basis, 
similar to the first form of Eq. (4.272). For coinciding arguments, p = p’, Eq. (55) is reduced to 


w(p) = w(p, p) = sa J fata’ x2" exp|- eon), (7.56) 


Now using Eq. (29) and then Eq. (4.265), this function may be represented as 
1 ; * 2 a4 
0) ara 7 W, | | dxdx COW; exp, ipl eens DW ,9,(") i(p), (7.57) 
J 


and hence interpreted as the probability density of the particle’s momentum at value p. Now, in the 
variables (51), Eq. (56) has the form 


'9 Such function, used to express the probability dW to find the system in a small area of the phase plane 
as dW = w.(X, P)dXadP, is a major notion of the (1D) classical statistics — see, e.g., SM Sec. 2.1. 


Chapter 7 Page 11 of 50 


Essential Graduate Physics QM: Quantum Mechanics 


1 x ae DX | ~ 
w(p) = 5 | } uf ae z) ex PE aa (7.58) 


Comparing this equality with the definition (50) of the Wigner function, we see that 
w(P) = | W(X, P)dX . (7.59) 


Thus, according to Eqs. (53) and (59), the integrals of the Wigner function over either the 
coordinate or momentum give the probability densities to find the system at a certain value of the 
counterpart variable. This is of course the main requirement to any quantum-mechanical candidate for 
the best analog of the classical probability density, wa(X, P). 


Let us see at how does the Wigner function look for the simplest systems at thermodynamic 
equilibrium. For a free 1D particle, we can use Eq. (34), ignoring for simplicity the normalization issues: 


* mk ,TX’ iPX | j= 
W(X,P) x |ex B ex dX . 7.60 
xP) eso} exo Ef (7.60) 
The usual Gaussian integration yields: 
P’ 
W(X, P) = const x exp, — 2 (7.61) 
2mk,T 


We see that the function is independent of X (as it should be for this translational-invariant system), and 
coincides with the Gibbs distribution (24). We could get the same result directly from classical statistics. 
This is natural because as we know from Sec. 2.2, the free motion is essentially not quantized — at least 
in terms of its energy and momentum. 


Now let us consider a substantially quantum system, the harmonic oscillator. Plugging Eq. (44) 
into Eq. (50), for that system in thermal equilibrium it is easy to show (and hence is left for reader’s 
exercise) that the Wigner function is also Gaussian, now in both its arguments: 


2\? 2 
IP) caer c| "8 res } (7.62) 


2 2m 


though the coefficient C is now different from 1/AgT , and tends to that limit only at high temperatures, 
kgT >> ha. Moreover, for a Glauber state, the Wigner function also gives a very plausible result — a 
Gaussian distribution similar to Eq. (62), but properly shifted from the origin to the central point of the 
state — see Sec. 5.5.7° 


Unfortunately, for some other possible states of the harmonic oscillator, e.g., any pure Fock state 
with n > 0, the Wigner function takes negative values in some regions of the LX, P] plane — see Fig. 3.7! 
(Such plots were the basis of my, admittedly very imperfect, classical images of the Fock states in Fig. 
5.8.) 


20 Please note that in the notation of Sec. 5.5, the capital letters Y and P mean not the arguments of the Wigner 
function, but the Cartesian coordinates of the central point (5.102), i.e. the classical complex amplitude of the 
oscillations. 

21 Spectacular experimental measurements of this function (for n = 0 and n = 1) were carried out recently by E. 
Bimbard et al., Phys. Rev. Lett. 112, 033601 (2014). 
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Fig. 7.3. The Wigner functions W(X, P) of a harmonic oscillator, in a few of its stationary 
(Fock) states n: (a) n = 0, (b) n = 1; (c) n = 5. Graphics by J. S. Lundeen; adapted from 
http://en.wikipedia.org/wiki/Wigner function as a public-domain material. 


The same is true for most other quantum systems and their states. Indeed, this fact could be 
predicted just by looking at the definition (50) applied to a pure quantum state, in which the density 
function may be factored — see Eq. (31): 

1 xX) * x iPX | + 

W(X,P) == | vfs ; ) y [x ; Jes ; ja. (7.63) 
Changing the argument P (say, at fixed X), we are essentially changing the spatial “frequency” (wave 
number) of the wavefunction product’s Fourier component we are calculating, and we know that their 
Fourier images typically change sign as the frequency is changed. Hence the wavefunctions should have 
some high-symmetry properties to avoid this effect. Indeed, the Gaussian functions (describing, for 
example, the Glauber states, and in their particular case, the ground state of the harmonic oscillator) 
have such symmetry, but many other functions do not. 


Hence if the Wigner function was taken seriously as the quantum-mechanical analog of the 
classical probability density we(X, P), we would need to interpret the negative probability of finding the 
particle in certain elementary intervals dXdP — which is hard to do. However, the function is still used 
for a semi-quantitative interpretation of mixed states of quantum systems. 


7.3. Open system dynamics: Dephasing 


So far we have discussed the density operator as something given at a particular time instant. 
Now let us discuss how is it formed, 1.e. its evolution in time, starting from the simplest case when the 
probabilities W; participating in Eq. (15) are time-independent — by this or that reason, to be discussed in 
a moment. In this case, in the Schrédinger picture, we may rewrite Eq. (15) as 


W(t) = >|"; ())W,(w,(0)]. (7.64) 


Taking a time derivative of both sides of this equation, multiplying them by if, and applying Eq. (4.158) 
to the basis states w;, with the account of the fact that the Hamiltonian operator is Hermitian, we get 
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ihiy = iny>| 0,())W, (w,(0)|+|w, (0), (w,(0)) 
=> (fw, )7,(w, |=], O)M, (w,O|8) (7.65) 
= HY)|w,(O)W,(w,O|- Dw, 0), (w OF. 


Now using Eq. (64) again (twice), we get the so-called von Neumann equation 


inv =| AW]. (acy “atom 


Note that this equation is similar in structure to Eq. (4.199) describing the time evolution of time- 
independent operators in the Heisenberg picture operators: 


ind = [4,41], (7.67) 


besides the opposite order of the operators in the commutator — equivalent to the change of sign of the 
right-hand side. This should not be too surprising, because Eq. (66) belongs to the Schrédinger picture 
of quantum dynamics, while Eq. (67), to its Heisenberg picture. 


The most important case when the von Neumann equation is (approximately) valid is when the 
“own” Hamiltonian H , of the system s of our interest is time-independent, and its interaction with the 


environment is so small that its effect on the system’s evolution during the considered time interval is 
negligible, but it had lasted so long that it gradually put the system into a non-pure state — for example, 
but not necessarily, into the classical mixture (24).23 (This is an example of the second case discussed in 
Sec. 1, when we need the mixed-ensemble description of the system even if its current interaction with 
the environment is negligible.) If the interaction with the environment is stronger, and hence is not 
negligible at the considered time interval, Eq. (66) is generally not valid,?* because the probabilities W; 
may change in time. However, this equation may still be used for a discussion of one major effect of the 
environment, namely dephasing (also called “decoherence’”’), within a simple model. 


Let us start with the following general model a system interacting with its environment, which 
will be used throughout this chapter: 


A Interaction 
(7.68) — with 


environment 


22 In some texts, it is called the “Liouville equation”, due to its philosophical proximity to the classical Liouville 
theorem for the classical distribution function w,,(X, P) — see, e.g., SM Sec. 6.1 and in particular Eq. (6.5). 

23 In the last case, the statistical operator is diagonal in the stationary state basis and hence commutes with the 
Hamiltonian. Hence the right-hand side of Eq. (66) vanishes, and it shows that in this basis, the density matrix is 
completely time-independent. 

24 Very unfortunately, this fact is not explained in some textbooks, which quote the von Neumann equation 
without proper qualifications. 
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where {2} denotes the (huge) set of degrees of freedom of the environment.*> Evidently, this model is 
useful only if we may somehow tame the enormous size of the Hilbert space of these degrees of 
freedom, and so work out the calculations all way to a practicably simple result. This turns out to be 
possible mostly if the elementary act of interaction of the system and its environment is in some sense 
small. Below, I will describe several cases when this is true; the classical example is the Brownian 
particle interacting with the molecules of the surrounding gas or fluid.”° (In this example, a single hit by 
a molecule changes the particle’s momentum by a minor fraction.) On the other hand, the model (68) is 
not very productive for a particle interacting with the environment consisting of similar particles, when 
a single collision may change its momentum dramatically. In such cases, the methods discussed in the 
next chapter are more relevant. 


Now let us analyze a very simple model of an open two-level quantum system, with its intrinsic 
Hamiltonian having the form 


An 


H, =c.6,, (7.69) 


similar to the Pauli Hamiltonian (4.163),27 and a factorable, bilinear interaction — cf. Eq. (6.145) and its 
discussion: 


Hi. = fiase, > (7.70) 


where f is a Hermitian operator depending only on the set {A} of environmental degrees of freedom 
(“coordinates”), defined in their Hilbert space — different from that of the two-level system. As a result, 
the operators f {2} and H, {2} commute with &, - and with any other intrinsic operator of the two-level 


system. Of course, any realistic H,{A} is extremely complex, so that how much we will be able to 
achieve without specifying it, may be a pleasant surprise for the reader. 

Before we proceed to the analysis, let us recognize two examples of two-level systems that may 
be described by this model. The first example is a spin-’2 in an external magnetic field of a fixed 
direction (taken for the axis z), which includes both an average component % and a random 
(fluctuating) component B. (¢) induced by the environment. As it follows from Eq. (4.163b), it may be 
described by the Hamiltonian (68)-(70) with 

i » hy ~ 
c LG, and f=-—B(t). (7.71) 


Z 


25 Note that by writing Eq. (68), we are treating the whole system, including the environment, as a Hamiltonian 
one. This can always be done if the accounted part of the environment is large enough so that the processes in the 
system s of our interest do not depend on the type of boundary between this part and the “external” (even larger) 
environment; in particular, we may assume the total system to be closed, i.e. Hamiltonian. 

26 The theory of the Brownian motion, the effect first observed experimentally by biologist Robert Brown in the 
1820s, was pioneered by Albert Einstein in 1905 and developed in detail by Marian Smoluchowski in 1906-1907 
and Adriaan Fokker in 1913. Due to this historic background, in some older texts, the approach described in the 
balance of this chapter is called the “quantum theory of the Brownian motion”. Let me, however, emphasize that 
due to the later progress of experimental techniques, quantum-mechanical behaviors, including the environmental 
effects in them, have been observed in a rapidly growing number of various quasi-macroscopic systems, for which 
this approach is quite applicable. In particular, this is true for most systems being explored as possible qubits of 
prospective quantum computing and encryption systems — see Sec. 8.5 below. 

27 As we know from Secs. 4.6 and 5.1, such Hamiltonian is sufficient to lift the energy level degeneracy. 
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Another example is a particle in a symmetric double-well potential U; (Fig. 4), with a barrier 
between them sufficiently high to be practically impenetrable, and an additional force F(t), exerted by 
the environment, so that the total potential energy is U(x, t) = U(x) — F(t)x. If the force, including its 
static part and fluctuations F (t), is sufficiently weak, we can neglect its effects on the shape of 
potential wells and hence on the localized wavefunctions yz, so that the force effect is reduced to the 
variation of the difference EF, — Ep = F(t)Ax between the eigenenergies. As a result, the system may be 
described by Eqs. (68)-(70) with 


ce, =-Fhx/2; ff =—F(pax/2. (7.72) 


Fig. 7.4. Dephasing in a double-well 
system. 


Let us start our general analysis of the model described by Eqs. (68)-(70) by writing the equation 
of motion for the Heisenberg operator o, (t): 


ing. =|6..A]=(c.+flé..é.]=0, (7.73) 


showing that in our simple model (68)-(70), the operator G, does not evolve in time. What does this 
mean for the observables? For an arbitrary density matrix of any two-level system, 


Wy Wo 
w=[ } (7.74) 


Wy, Wa. 


we can readily calculate the trace of operator 6,w. Indeed, since the operator traces are basis- 
independent, we can do this in any basis, in particular in the usual z-basis: 


lA 1 OYw, wy 
Tr(¢,v) = Tr(o,w) = n|[ I | =W,, —W =W,-W,. (7.75) 


0 -I)wWy Wy 


Since, according to Eq. (5), @, may be considered the operator for the difference of the number 
of particles in the basis states 1 and 2, in the case (73) the difference W, — W2 does not depend on time, 
and since the sum of these probabilities is also fixed, W; + W2 = 1, both of them are constant. The 
physics of this simple result is especially clear for the model shown in Fig. 4: since the potential barrier 
separating the potential wells is so high that tunneling through it is negligible, the interaction with the 
environment cannot move the system from one well into another one. 


It may look like nothing interesting may happen in such a simple situation, but in a minute we 
will see that this is not true. Due to the time independence of W; and W2, we may use the von Neumann 
equation (66) to describe the density matrix evolution. In the usual z-basis: 
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| A) (! 0\(w, Wp | ) 0 2w, er) 
-(.+AI() ae "WAb +7 [3 0 . 


This result means that while the diagonal elements, 1.e., the probabilities of the states, do not evolve in 
time (as we already know), the off-diagonal elements do change; for example, 


ihw,, =2(c, + fy, (7.77) 


with a similar but complex-conjugate equation for wz, The solution of this linear differential equation 
(77) is straightforward, and yields 


W(t) = w,(0) exp i ~ ; 29f- Fl junae|, (7.78) 


The first exponent is a deterministic c-number factor, while in the second one f (t)= i {act)} is still an 
operator in the Hilbert space of the environment, but from the point of view of the two-level system of 
our interest, it is a random function of time. The time-average part of this function may be included in 
c;, SO in what follows, we will assume that it equals zero. 


Let us start from the limit when the environment behaves classically.?8 In this case, the operator 
in Eq. (78) may be considered as a classical random function of time f(t), provided that we average its 
effects over a statistical ensemble of many functions f(¢) describing many (macroscopically similar) 
experiments. For a small time interval t = dt + 0, we can use the Taylor expansion of the exponent, 
truncating it after the quadratic term: 


(oo|-2 J poral )ate( id J roa) (402 J pear -2 J roa) 


dt dt 


= 2K f(t))at’ fae far fOVO) =! = J dt’ J dt"K ,(t'—t"). 


(7.79) 


Here we have used the facts that the statistical average of f(t) is equal to zero, while the second average, 
called the correlation function, in a statistically- (i.e. macroscopically-) stationary state of any 
environment may only depend on the time difference 7 = t’—¢”: 


(Ff) = K y(t" =2") = K (0). (7.80) 


If this difference is much larger than some time scale 7,, called the correlation time of the environment, 

the values f(t’) and f(t”) are completely independent (uncorrelated), as illustrated in Fig. 5a, so that at 7 
— o, the correlation function has to tend to zero. On the other hand, at 7 = 0, i.e. t’ = t”’, the correlation 
function is just the variance of f: 


28 This assumption is not in contradiction with the need for the quantum treatment of the two-level system s, 
because a typical environment is large, and hence has a very dense energy spectrum, with the distances adjacent 
levels that may be readily bridged by thermal excitations of small energies, often making it essentially classical. 
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K (0) =(f"), (7.81) 
and has to be positive. As a result, the function looks (semi-quantitatively) as shown in Fig. 5b. 
w (b) 
(FOF) 
Fig. 7.5. (a) A typical random 
process and (b) its correlation 
Ls t'—t" function — schematically. 


Hence, if we are only interested in time differences t much longer than 7, which is typically 
very short, we may approximate Kz) well with a delta function of the time difference. Let us take it in 
the following form, convenient for later discussion: 


K ,(t)*h’D,o(t), (7.82) 


where Do is a positive constant called the phase diffusion coefficient. The origin of this term stems from 
the very similar effect of classical diffusion of Brownian particles in a highly viscous medium. Indeed, 
the particle’s velocity in such a medium is approximately proportional to the external force. Hence, if 
the random hits of a particle by the medium’s molecules may be described by a force that obeys a law 
similar to Eq. (82), the velocity (along any Cartesian coordinate) is also delta-correlated: 


(v(t) =0,  (v(t’)v(t")) = 2Dd(t' -t"). (7.83) 


Now we can integrate the kinematic relation x =v, to calculate particle’s displacement from its initial 
position during a time interval [0, ¢] and its variance: 


x(0)—x(0) = f (eat (7.84) 


0 


(x Z x(0)) ) = ( vena vere = far| dt"(v(tv(t")) = far| dt"2D65(t'-t")=2Dt. (7.85) 


This is the famous law of diffusion, showing that the r.m.s. deviation of the particle from the initial point 
grows with time as (2D1)'””, where the constant D is called the diffusion coefficient. 


Returning to the diffusion of the quantum-mechanical phase, with Eq. (82) the last double 
integral in Eq. (79) yields i’ D dt, so that the statistical average of Eq. (78) is 


w,>(dt)) = w,, (0) exp 42% a \i_2p dt). (7.86) 
( ) h 9 


Applying this formula to sequential time intervals, 


(w,(2dt)) = (w,, (dt) exp all 2D, dt)= w,;(0) exp|- (72 anh ~2D, dt), (7.87) 
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etc., for a finite time t = Ndt, in the limit N — and dt — 0 (at fixed f) we get 


(w(t) = w,2 (0) exp (7%:,| xlim ,,.,., fi —2D,t x] ; (7.88) 


By the definition of the natural logarithm base e,”° this limit is just exp {-2D,f}, so that, finally: 


(a(0) = wi(Odexn4 (1h exp 20,}= wlOexp| i241 exp} |. 


7 (7.89) 


So, due to coupling to the environment, the off-diagonal elements of the density matrix decay 
with some dephasing time Tz = 1/2Dg, providing a natural evolution from the density matrix (22) of a 
pure state to the diagonal matrix (23), with the same probabilities Wi, describing a fully dephased 
(incoherent) classical mixture.?° 


This simple model offers a very clear look at the nature of the decoherence: the random “force” 
fi), exerted by the environment, “shakes” the energy difference between two eigenstates of the system 
and hence the instantaneous velocity 2(c, + f)/h of their mutual phase shift g(¢) — cf. Eq. (22). Due to the 
randomness of the force, g(t) performs a random walk around the trigonometric circle, so that the 
average of its trigonometric functions exp{+ig} over time gradually tends to zero, killing the off- 
diagonal elements of the density matrix. Our analysis, however, has left open two important issues: 


(1) Is this approach valid for a quantum description of a typical environment? 


(ii) If yes, what is physically the D, that was formally defined by Eq. (82)? 


7.4. Fluctuation-dissipation theorem 


Similar questions may be asked about a more general situation, when the Hamiltonian H , of the 


system of interest (s), in the composite Hamiltonian (68), is not specified at all, but the interaction 
between that system and its environment still has a bilinear form similar to Eqs. (70) and (6.130): 


A 


H,, =—F {A} &, (7.90) 
where x is some observable of our system s — say, its generalized coordinate or generalized momentum. 
It may look incredible that in this very general situation one still can make a very simple and powerful 


statement about the statistical properties of the generalized force F, under only two (interrelated) 
conditions — which are satisfied in a huge number of cases of interest: 


(i) the coupling of system s of interest to its environment e is weak — in the sense that the 
perturbation theory (see Chapter 6) is applicable, and 


29 See, e.g., MA Eq. (1.2a) with n = —N/2D gq. 

30 Note that this result is valid only if the approximation (82) may be applied at time interval dt which, in turn, 
should be much smaller than the 7, in Eq. (88), i.e. 1f the dephasing time is much longer than the environment’s 
correlation time %. This requirement may be always satisfied by making the coupling to the environment 
sufficiently weak. In addition, in typical environments, 7, is very short. For example, in the original Brownian 
motion experiments with a-few-um pollen grains in water, it is of the order of the average interval between 
sequential molecular impacts, of the order of 107! s. 
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(11) the environment may be considered as staying in thermodynamic equilibrium, with a certain 
temperature 7, regardless of the process in the system of interest.+! 


This famous statement is called the fluctuation-dissipation theorem (FDT).32 Due to the 
importance of this fundamental result, let me derive it.33 Since by writing Eq. (68) we treat the whole 
system (s + e) as a Hamiltonian one, we may use the Heisenberg equation (4.199) to write 


ink =|F,A|=|F,A,], (7.91) 


because, as was discussed in the last section, operator F{2}commutes with both H, and. Generally, 


very little may be done with this equation, because the time evolution of the environment’s Hamiltonian 
depends, in turn, on that of the force. This is where the perturbation theory becomes indispensable. Let 
us decompose the force operator into the following sum: 


Pia}= (F)+FO, with (Fo) iG. (7.92) 


where (here and on, until further notice) the sign (...) means the statistical averaging over the 
environment alone, i.e. over an ensemble with absolutely similar evolutions of the system s, but random 
states of its environment.?+ From the point of view of the system s, the first term of the sum (still an 
operator!) describes the average response of the environment to the system dynamics (possibly, 
including such irreversible effects as friction), and has to be calculated with a proper account of their 
interaction — as we will do later in this section. On the other hand, the last term in Eq. (92) represents 
random fluctuations of the environment, which exist even in the absence of the system s. Hence, in the 
first non-zero approximation in the interaction strength, the fluctuation part may be calculated ignoring 
the interaction, i.e. treating the environment as being in thermodynamic equilibrium: 


inf = PA, 


| (7.93) 


Since in this approximation the environment’s Hamiltonian does not have an explicit dependence on 
time, the solution of this equation may be written by combining Eqs. (4.190) and (4.175): 


3! The most frequent example of the violation of this condition is the environment’s overheating by the energy 
flow from system s. Let me leave it to the reader to estimate the overheating of a standard physical laboratory 
room by a typical dissipative quantum process — the emission of an optical photon by an atom. (Hint: it is 
extremely small.) 

32 The FDT was first derived by Herbert Callen and Theodore Allen Welton in 1951, on the background of an 
earlier derivation of its classical limit by Harry Nyquist in 1928. 

33 The FDT may be proved in several ways that are shorter than the one given below — see, e.g., either the proof in 
SM Secs. 5.5 and 5.6 (based on H. Nyquist’s arguments), or the original paper by H. Callen and T. Welton, Phys. 
Rev. 83, 34 (1951) — wonderful in its clarity. The longer approach I will describe here, besides giving the 
important Green-Kubo formula (109) as a byproduct, is a very useful exercise in the operator manipulation and 
the perturbation theory in its integral form — different from the differential forms used in Chapter 6. If the reader 
is not interested in this exercise, they may skip the derivation and jump straight to the result expressed by Eq. 
(134), which uses the notions defined by Eqs. (114) and (123). 

34 For usual (“ergodic”) environments, without intrinsic long-term memories, this statistical averaging over an 
ensemble of environments is equivalent to averaging over intermediate times — much longer than the correlation 
time 7, of the environment, but still much shorter than the characteristic time of evolution of the system under 
analysis, such as the dephasing time 7 and the energy relaxation time 7; — both still to be calculated. 
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F(t) = exp ~ Ht, 


Ss t}(oexp|- ~H, 


Let us use this relation to calculate the correlation function of the fluctuations F(t), defined 
similarly to Eq. (80), but taking care of the order of the time arguments (very soon we will see why): 


(F()F()) = (exr(* “Arsh Alojexo|- “i ahesp\ + “Ata Aojex|- vA.0\), (7.95) 


(Here, for the notation brevity, the thermal equilibrium of the environment is just implied.) We may 
calculate this expectation value in any basis, and the best choice for it is evident: in the environment’s 
stationary-state basis, the density operator of the environment, its Hamiltonian, and hence the exponents 
in Eq. (95) are all represented by diagonal matrices. Using Eq. (5), the correlation function becomes 


(FF (e)) = 1 exp] “ f1.1| Foes - A} exp 4 “ fia'| Foes - Li'l 
= Yes “ f1a| Foes - “Aah exp + “ fta'|Fo)esp|- ‘A! 
= 2 W, exp i B,th hy exp = i5,1| exp + i B,1'lF exp i, (7.96) 
= DY, : exp +H(6, Ey )o-0)}. 


Here W,, are the Gibbs distribution probabilities given by Eq. (24), with the environment’s temperature 
T, and Fyn’ = Fn (0) are the Schrédinger-picture matrix elements of the interaction force operator. 


: ; (7.94) 


Ye 


We see that though the correlator (96) is a function of the difference r= ¢ — t’ only (as it should 
be for fluctuations in a macroscopically stationary system), it may depend on the order of its arguments. 
This is why let us mark this particular correlation function with the upper index “+”, 


Kzle)= (FOF) =EM, 


| ime 


i _ 
; ex a , whereF =E, -E,,, (7.97) 


ce 99, 


while its counterpart, with the swapped times ¢ and ¢’, with the upper index 


K;(t)=K}(C-7)=(F)FQ) = » W,\F.,.| ef eh (7.98) 


So, in contrast with classical processes, in quantum mechanics the correlation function of fluctuations 


F isnot necessarily time-symmetric: 
Kz (t)-K-()= K7(2)-Ke(-2)= (FOR()-F})FO) = 2M, 


so that F (t) gives one more example of a Heisenberg-picture operator whose “values”, taken in 
different moments of time, generally do not commute — see Footnote 49 in Chapter 4. (A good sanity 
check here is that at r= 0, i.e. at t=’, the difference (99) between Kr’ and Ky vanishes.) 


_E 
Bs ‘sin =~ #0, (7.99) 
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Now let us return to the force operator’s decomposition (92), and calculate its first (average) 
component. To do that, let us write the formal solution of Eq. (91) as follows: 


F(t) = - } FW), H, (B) (7.100) 


On the right-hand side of this relation, we still cannot treat the Hamiltonian of the environment as an 
unperturbed (equilibrium) one, even if the effect of our system (s) on the environment is very weak, 
because this would give zero statistical average of the force F(t). Hence, we should make one more step 
of our perturbative treatment, taking into account the effect of the force on the environment. To do this, 
let us use Eqs. (68) and (90) to write the (so far, exact) Heisenberg equation of motion for the 
environment’s Hamiltonian, 


int, =|A,.A]=-377,. F}, (7.101) 
and its formal solution, similar to Eq. (100), but for time ¢’ rather than f: 
H. (t')= -= (20 [7 Fear. (7.102) 
u —0o 


Plugging this equality into the right-hand side of Eq. (100), and averaging the result (again, over the 
environment only!), we get 


(Fo) = 2 far] dv" se") (FW) [A..F 0). (7.103) 


This is still an exact result, but now it is ready for an approximate treatment, implemented by 
averaging in its right-hand side over the unperturbed (thermal-equilibrium) state of the environment. 
This may be done absolutely similarly to that in Eq. (96), at the last step using Eq. (94): 


(Ae) |a H(t") F F"))) = Tr{w [F(¢’), [HF] 
=Tr {w - ‘)H_,F(0") — F(¢ ear aaa 


Lu, ios ACE, Fin Mt ‘n (¢" ’) ~ oe ((')F, n (¢ "VE, =i Ho (¢ "\F,, ‘n (v’) - Fin Mt (¢ "VE, Pint ‘n (x' ) 


ny n' 


2 T(4r_ an (7.104) 
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Now, if we try to integrate each term of this sum, as Eq. (103) seems to require, we will see that the 
lower-limit substitution (at t’, t’’ > —c) is uncertain because the exponents oscillate without decay. This 
mathematical difficulty may be overcome by the following physical reasoning. As illustrated by the 
example considered in the previous section, coupling to a disordered environment makes the “memory 
horizon” of the system of our interest (s) finite: its current state does not depend on its history beyond a 
certain time scale.*> As a result, the function under the integrals of Eq. (103), i.e. the sum (104), should 


35 Actually, this is true for virtually any real physical system — in contrast to idealized models such as a 
dissipation-free oscillator that swings for ever and ever with the same amplitude and phase, thus “remembering” 
the initial conditions. 
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self-average at a certain finite time. A simplistic technique for expressing this fact mathematically is just 
dropping the lower-limit substitution; this would give the correct result for Eq. (103). However, a better 
(mathematically more acceptable) trick is to first multiply the functions under the integrals by, 
respectively, exp{et—t’)} and exp{at’—¢’)}, where ¢ is a very small positive constant, then carry out 
the integration, and after that follow the limit ¢ > 0. The physical justification of this procedure may be 
provided by saying that the system’s behavior should not be affected if its interaction with the 
environment was not kept constant but rather turned on gradually — say, exponentially with an 
infinitesimal rate ¢. With this modification, Eq. (103) becomes 


‘ litt,.4 fae [ae R(t") ese +e(t" - af + es (7.105) 


—0o 


Ee 


(FO) = DWE 


n,n' 


This double integration is over the area shaded in Fig. 6, which makes it obvious that the order of 
integration may be changed to the opposite one as 


t ia t t t 0 t T 
[ae far”..= [ae"[at'..= de" [a(t —-0)..= [ae"[dv..., (7.106) 
-0  —0 00 t" 00 t"-t —0 0 


where t’=t-—t’, and tr =t-t”. 


t" 


Fig. 7.6. The 2D integration 
area in Eqs. (105) and (106). 


As aresult, Eq. (105) may be rewritten as a single integral, 


(F) = [ou — 1") R(t"dt" = few &(t—r)dr, (7.107) 


whose kernel, 


G(r > 0)= - LW E Fs ; lim, 59 fjeo| 2 es | + es| dt’ 
i 2 : (7.108) 
litt 1 “yw, Fo sine ~ yw, Fl sin, 


' ' 
now non 


does not depend on the particular law of evolution of the system (s) under study, i.e. provides a general 
characterization of its coupling to the environment. 


In Eq. (107) we may readily recognize the most general form of the linear response of a system 
(in our case, the environment), taking into account the causality principle, where G(z) is the response 
function (also called the “temporal Green’s function”) of the environment. Now comparing Eq. (108) 
with Eq. (99), we get a wonderfully simple universal relation, 
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(Fom.Fo)) = ihG(c). (7.109) 


that emphasizes once again the quantum nature of the correlation function’s time asymmetry. (This 
relation, called the Green-Kubo (or just “Kubo”) formula after the works by Melville Green (1954) and 
Ryogo Kubo (1957), does not come up in the easier derivations of the FDT, mentioned in the beginning 
of this section.) 


However, for us the relation between the function G(7) and the force’s anti-commutator, 


Il 


(Fe + nF) = (Fe +1) F(t) + FOF (t+ ) = K+(r)+K;(c), (7.110) 


is much more important, because of the following reason. Eqs. (97)-(98) show that the so-called 
symmetrized correlation function, 


+ 7 x a Er -2 
K,(r)= K;(r)+ K-(r) = lF@.Fo) = lim, oy Va Faw : ee ‘ ‘ 
2 2 andar h (7.111) 


E 
= >, ae ; cos, 


n,n' 


which is an even function of the time difference z, looks very similar to the response function (108), 
“only” with another trigonometric function under the sum, and a constant front factor.° This similarity 
may be used to obtain a direct algebraic relation between the Fourier images of these two functions of rt. 
Indeed, the function (111) may be represented as the Fourier integral37 


K,(t)= [S,(w)e '°'do=2[S,(w)cosar da, (7.112) 
0 


with the reciprocal transform 


S,.(@) = [K-@el'ar == [K,(c)coser dt, (7.113) 
—oo 0 


of the symmetrized spectral density of the variable F, defined as 


S,,(@)5(@- a’) = 5( F\= F (7.114) 


where the function F, (also a Heisenberg operator rather than a c-number!) is defined as 


F, a | F(tje' dt, — sothat F(t) = } Fe da. (7.115) 
20 


—o 


The physical meaning of the function S;(@) becomes clear if we write Eq. (112) for the 
particular case t = 0: 


36 For the heroic reader who has suffered through the calculations up to this point: our conceptual work is done! 
What remains is just some simple math to bring the relation between Eqs. (108) and (111) to an explicit form. 

37 Due to their practical importance, and certain mathematical issues of their justification for random functions, 
Eqs. (112)-(113) have their own grand name, the Wiener-Khinchin theorem, though the math rigor aside, they are 
just a straightforward corollary of the standard Fourier integral transform (115). 
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K,(0)= (F*) = [Se(ode ss 2fs, (ade. (7.116) 


This formula infers that if we pass the function F(¢) through a linear filter cutting from its frequency 
spectrum a narrow band dw of physical (positive) frequencies, then the variance (F;) of the filtered 
signal F(t) would be equal to 2S;(@)da@ — hence the name “spectral density’’.?8 


Let us use Eqs. (111) and (113) to calculate the spectral density of fluctuations F (t) in our 
model, using the same étrick as at the deviation of Eq. (108), to quench the upper-limit substitution: 


= AIF 


aw 
=o 


n,n' 


Et -<|r| i@T 
=i, Joos Fe edt 


lim. <9 f [|eso| Jreele ET AIO dy (7.117) 


nn' 


nn' 


1 
20 


FIA 1 iy 1 
are ia oe i(E/h+o)-é i(-E/n+o)-e| 


Now it is a convenient time to recall that each of the two summations here is over the eigenenergies of 
the environment, whose spectrum is virtually continuous because of its large size, so that we may 
transform each sum into an integral — just as this was done in Sec. 6.6: 


D> fda =|..e(E, aE, (7.118) 


where P(£) = dn/dE is the environment’s density of states at a given energy. This transformation yields 


$,(0)=5—lim, 9 [4E,W(E, )plE, ,)) dé, P(E, EF . (7.119) 


nn' 


1 1 
= + = 
ier o) E i( E/h o) E 
Since the expression inside the square bracket depends only on a specific linear combination of two 
energies, namely on E =E,-E,,, it is convenient to introduce also another, linearly-independent 


combination of the energies, for example, the average energy E = (E te, \V/ 2, so that the state energies 
may be represented as 


E,=E+ Ee E,,=E- a (7.120) 
2° 2 
With this notation, Eq. (119) becomes 
h Np stetes BM es BS 8 2 1 
S,(@)=—-—lim dE| |\dEW\ E+— E+— E-—||F.. = 
r(o)=—> lim, 50 | J a a | | (E—ho)—he 
is aw: £3. fe iE 2 1 
+|dEW| E+— |p| E+— |p| E-— IF, = : TAL 
J [ ry oy i . CE=no vied 


38 An alternative popular measure of the spectral density of a process F(A) is $v) = (FP)/dv = 425,(@), where v 
= @/2 771s the “cyclic” frequency (measured in Hz). 
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Due to the smallness of the parameter ig (which should be much smaller than all genuine energies of 
the problem, including kg7, ha, E,, and E,,’), each of the internal integrals in Eq. (121) is dominated by 
an infinitesimal vicinity of one point, E, =tha. In these vicinities, the state densities, the matrix 


elements, and the Gibbs probabilities do not change considerably, and may be taken out of the integral, 
which may be then worked out explicitly:9 


h = 2 dE 2'F dE 
S,(@)=-—li dE WF = WF = 
r(o) 2a im.o] po} F eae ae an eerEy 


2 ad = 1't -i(E -ho)-he ~ 2 i(Z +ho)-he 
=~ lims0 | Be.p-| MIF Ie ~ho) ae Ie hal + (ne) 


dE 


=2fo.olwjep wie) ke, (7.122) 


where the indices + mark the functions’ values at the special points E , =tho, i.e. E, = E, + ho. The 
physics of these points becomes simple if we interpret the state n, for which the equilibrium Gibbs 
distribution function equals W,,, as the initial state of the environment, and n’ as its final state. Then the 
top-sign point corresponds to E,, = E, — ha, i.e. to the result of emission of one energy quantum fo of 
the “observation” frequency @ by the environment to the system s of our interest, while the bottom-sign 
point E,,,= E, + ha, corresponds to the absorption of such quantum by the environment. As Eq. (122) 
shows, both processes give similar, positive contributions into the force fluctuations. 


The situation is different for the Fourier image of the response function G(7),*° 


1(@) = fomear, (7.123) 


that is usually called either the generalized susceptibility or the response function — in our case, of the 
environment. Its physical meaning is that according to Eq. (107), the complex function 7(@) = y’(@) + 
iy’(q@) relates the Fourier amplitudes of the generalized coordinate and the generalized force: +! 


(F,) = y(a)k,. (7.124) 


The physics of its imaginary part y’’(@) is especially clear. Indeed, if x,, represents a sinusoidal classical 
process, say 
x(t) = x, cos @t = oe ie Le. x, =X_, = *o (7.125) 


39 Using, e.g., MA Eq. (6.5a). (The imaginary parts of the integrals vanish, because the integration in infinite 
limits may be always re-centered to the finite points +i@.) A math-enlightened reader may have noticed that the 
integrals might be taken without the introduction of small ¢, using the Cauchy theorem — see MA Eq. (15.1). 

40 The integration in Eq. (123) may be extended to the whole time axis, — 0 < r< +00, if we complement the 
definition (107) of the function G(z) for 7 > 0 with its definition as G(r ) = 0 for z< 0, in correspondence with the 
causality principle. 

41 Tn order to prove this relation, it is sufficient to plug expression x, = Re") or any sum of such exponents, 


into Eqs. (107) and then use the definition (123). This (simple) exercise is highly recommended to the reader. 
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then, in accordance with the correspondence principle, Eq. (124) should hold for the c-number complex 
amplitudes F’, and x, enabling us to calculate the time dependence of the force as 


F(t)=F ec" +F_,e7™ = y(a)x,e°™ + 7(-@)x_,e7"™ = *o [y(w)e@" + x’ (we | 
(7.126) 


= 20: = x,[7'(@)cos at + y"(@)sin at | 


2 


lx’ by iy"\e ™ FS (y’ = izne@ | 
We see that y’(@) weighs the force’s part (frequently called quadrature) that is 7/2-shifted from the 


coordinate x, i.e. is in phase with its velocity, and hence characterizes the time-average power flow from 
the system into its environment, i.e. the energy dissipation rate:*2 


P =—F(t)x(t)=—x|x'(o)cos at + v"(o)sin at |(— ax, sin ot) = = ay"(@). (7.127) 


Let us calculate this function from Eqs. (108) and (123), just as we have done for the spectral 
density of fluctuations: 


Ee ‘ 


* 2 . re E fips ee 
x"(@) = Im [e@ elOl dr - rome % lim ,_,q Im [2 xn ae -ce, gi@t .— eT gy 
0 nn’ 0 


Des 1 1 
lim, 59 lil = 
-E-hao-ihe E-hao-ihe 


en he he 
al onal +Qiel (B-Raf + (hey } eae 


Making the transfer (118) from the double sum to the double integral, and then the integration variable 
transfer (120), we get 


x"(@) = lim, [ae| jofesZ){e+Z\fe-£| any G aay dE 
we E+t+ho) +\he 
7 i. _ (7.129) 
ans ee oh = £E = £E 2 he ~ 
—|W| E+—|p\| E+ — AE-E rn — dE |. 
J f 4 [ | 2 ( -ne} + (hey | 


Now using the same argument about the smallness of parameter ¢ as above, we may take the spectral 
densities, the matrix elements of force, and the Gibbs probabilities out of the integrals, and work out the 
remaining integrals, getting a result very similar to Eq. (122): 


zo) ="{ p.o [WF F _wJF ak. (7.130) 


42 The sign minus in Eq. (127) is due to the fact that according to Eq. (90), F is the force exerted on our system (s) 
by the environment, so that the force exerted by our system on the environment is —F. With this sign clarification, 
the expression ? = —Fx =—Fv for the instant power flow is evident if x is the usual Cartesian coordinate of a 
1D particle. However, according to analytical mechanics (see, e.g., CM Chapters 2 and 10), it is also valid for any 
{generalized coordinate, generalized force} pair which forms the interaction Hamiltonian (90). 
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In order to relate these two results, it is sufficient to notice that according to Eq. (24), the Gibbs 
probabilities W+ are related by a coefficient depending on only the temperature T and observation 
frequency @: 


_ £ - E+ = 
W, = WE + 4) = WE + 2 - exp woe, ~ W Evol 


hee 


ho 
“I (7.131) 


B 


so that both the spectral density (122) and the dissipative part (130) of the generalized susceptibility 
may be expressed via the same integral over the environment energies: 


S,(@)= noon Be ali w(E)[F,P +|F) Jae, (7.132) 
xv'(o )= 2asin ale W(E yz.) + +|F_|’ |aE. (7.133) 
and hence are universally related as 
i ho 
S,,(@) = ant (@)coth KT : (7.134) 


This is, finally, the much-celebrated Callen-Welton’s fluctuation-dissipation theorem (FDT). It 
reveals a fundamental, intimate relationship between these two effects of the environment (“no 
dissipation without fluctuation”) — hence the name. A curious feature of the FDT is that Eq. (134) 
includes the same function of temperature as the average energy (26) of a quantum oscillator of 
frequency @, though, as the reader could witness, the notion of the oscillator was by no means used in its 
derivation. As will see in the next section, this fact leads to rather interesting consequences and even 
conceptual opportunities. 


In the classical limit, i@ << kpT, the FDT is reduced to 
2kal _ Ag T Im Im 7(@) 


5 @O 


S,(@)=—y"(@ os (7.135) 
In most systems of interest, the last fraction is close to a finite (positive) constant within a substantial 
range of relatively low frequencies. Indeed, expanding the right-hand side of Eq. (123) into the Taylor 
series in small @, we get 


z(o)= 20)+ion +... with 7(0)=[G(e)dz, and y= {G(e)rdr. (7.136) 


Since the temporal Green’s function G is real by definition, the Taylor expansion of v’(@) = Im7(@) at 
@= 0 starts with the linear term w7, where 77 is a certain real coefficient, and unless 7 = 0, is dominated 
by this term at small @ The physical sense of the constant 7 becomes clear if we consider an 
environment that provides a force described by a simple, well-known kinematic friction law 


(F) =-ni, with 7>0, (7137) 


where 77 is usually called the drag coefficient. For the Fourier images of coordinate and force, this gives 
the relation F', = i@7Xq, so that according to Eq. (124), 
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- V'(o) : Im 7( ) 


y(@)= ian, Ig =n=0. (7.138) 


(a) 


With this approximation, and in the classical limit, the FDT (134) is reduced to the well-known Nyquist 
formula:* 


ie. (Fe) =4k,Tndv. (7.139) 


According to Eq. (112), if such a constant spectral density** persisted at all frequencies, it would 
correspond to a delta-correlated process F(t), with 


K,.(t) =22S,,(0)5(t) = 2k,T6(t) (7.140) 


- cf. Eqs. (82) and (83). Since in the classical limit the right-hand side of Eq. (109) is negligible, and the 
correlation function may be considered an even function of time, the symmetrized function under the 
integral in Eq. (113) may be rewritten just as (F(7)F(0)). In the limit of relatively low observation 
frequencies (in the sense that w is much smaller than not only the quantum frontier kg7/h but also the 
frequency scale of the function v’(@)/@), Eq. (138) may be used to recast Eq. (135) in the form*> 


n= lim, 49 ato) = a [(F@)FO)ac. (7.141) 


To conclude this section, let me return for a minute to the questions formulated in our earlier 
discussion of dephasing in the two-level model. In that problem, the dephasing time scale is 7) = 1/2D,. 
Hence the classical approach to the dephasing, used in Sec. 3, is adequate if iD, << kgT. Next, we may 
identify the operators f and o, participating in Eq. (70) with, respectively, —F) and & participating 
in the general Eq. (90). Then the comparison of Eqs. (82), (89), and (140) yields 


(7.142) 


43 Actually, the 1928 work by H. Nyquist was about the electronic noise in resistors, just discovered 
experimentally by his Bell Labs colleague John Bertrand Johnson. For an Ohmic resistor, as the dissipative 
“environment” of the electric circuit it is connected with, Eq. (137) is just the Ohm’s law, and may be recast as 
either (V) = —R(dO/dt) = RI, or (1) = —G(d@/dt) = GV. Thus for the voltage V across an open circuit, 7 
corresponds to its resistance R, while for current J in a short circuit, to its conductance G = 1/R. In this case, the 
fluctuations described by Eq. (139) are referred to as the Johnson-Nyquist noise. (Because of this important 
application, any model leading to Eq. (138) is commonly referred to as the Ohmic dissipation, even if the physical 
nature of the variables x and F is quite different from voltage and current.) 

44 A random process whose spectral density may be reasonably approximated by a constant is frequently called 
the white noise, because it is a random mixture of all possible sinusoidal components with equal weights, 
reminding the spectral composition of the natural white light. 

45 Note that in some fields (especially in physical kinetics and chemical physics), this particular limit of the 
Nyquist formula is called the Green-Kubo (or just “Kubo”) formula. However, in the view of the FDT 
development history (described above), it is much more reasonable to associate these names with Eq. (109) — as it 
is done in most fields of physics. 
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so that, for the model described by Eq. (137) with a temperature-independent drag coefficient 77, the rate 
of dephasing by a classical environment is proportional to its temperature. 


7.5. The Heisenberg-Langevin approach 


The fluctuation-dissipation theorem offers a very simple and efficient, though limited approach 
to the analysis of the system of interest (s in Fig. 1). It is to write its Heisenberg equations (4.199) of 
motion of the relevant operators, which would now include the environmental force operator, and 
explore these equations using the Fourier transform and the Wiener-Khinchin theorem (112)-(113). This 
approach to classical equations of motion is commonly associated with the name of Langevin,** so that 
its extension to dynamics of Heisenberg-picture operators is frequently referred to as the Heisenberg- 
Langevin (or “quantum Langevin’, or “Langevin-Lax”4’) approach to open system analysis. 


Perhaps the best way to describe this method is to demonstrate how it works for the very 
important case of a 1D harmonic oscillator, so that the generalized coordinate x of Sec. 4 is just the 
oscillator’s coordinate. For the sake of simplicity, let us assume that the environment provides the 
simple Ohmic dissipation described by Eq. (137) — which is a very good approximation in many cases. 
As we already know from Chapter 5, the Heisenberg equations of motion for operators of coordinate and 
momentum of the oscillator, in the presence of an external force F(t), are 


f=", p=-ma2i+F, (7.143) 
m 
so that using Eqs. (92) and (137), we get 
f=? p=-ma2k-ni+ F(t). (7.144) 
m 


Combining Eqs. (144), we may write their system as a single differential equation 
mx + nk +ma2k = F(t), (7.145) 


which is similar to the well-known classical equation of motion of a damped oscillator under the effect 
of an external force. In the view of Eqs. (5.29) and (5.35), whose corollary the Ehrenfest theorem (5.36) 
is, this may look not surprising, but please note again that the approach discussed in the previous section 
justifies such quantitative description of the drag force in quantum mechanics — necessarily in parallel 
with the accompanying fluctuation force. 


For the Fourier images of the operators, defined similarly to Eq. (115), Eq. (145) gives the 
following relation, 


46 A 1908 work by Paul Langevin was the first systematic development of Einstein’s ideas (1905) on the 
Brownian motion, using the random force language, as an alternative to Smoluchowski’s approach using the 
probability density language — see Sec. 6 below. 

47 Indeed, perhaps the largest credit for the extension of the Langevin approach to quantum systems belongs to 
Melvin J. Lax, whose work in the early 1960s was motivated mostly by quantum electronics applications — see, 
e.g., his monograph M. Lax, Fluctuation and Coherent Phenomena in Classical and Quantum Physics, Gordon 
and Breach, 1968, and references therein. 
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F 
2, = e (7.146) 


mle? ~@)-ino 


which should be also well known to the reader from the classical theory of forced oscillations.*8 
However, since these Fourier components are still Heisenberg-picture operators, and their “values” for 
different @ generally do not commute, we have to tread carefully. The best way to proceed is to write a 
copy of Eq. (146) for frequency (-@’), and then combine these equations to form a symmetrical 
combination similar to that used in Eq. (114). The result is 


a os 1 lian a ee 
(2,8 9 +3 gS) = os (FFg tho). (7.147) 
|m(oi -«*)-ino| 
Since the spectral density definition similar to Eq. (114) is valid for any observable, in particular for x, 
Eq. (147) allows us to relate the symmetrized spectral densities of coordinate and force: 


S,(@) a S;(@) 


S(@) = = : (7.148) 
| mo; -«*)-ino| m? (a2 -@?) +(no) 
Now using an analog of Eq. (116) for x, we can calculate the coordinate’s variance: 
(x?)=K,(0)= i S.(o)do =2 | See (7.149) 


*(} - 0°) + (qo) 


where now, in contrast to the notation used in Sec. 4, the sign (...) means averaging over the usual 
statistical ensemble of many systems of interest — in our current case, of many harmonic oscillators. 


If the coupling to the environment is so weak that the drag coefficient 77 is small (in the sense 
that the oscillator’s dimensionless Q-factor is large, O = m@o/7 >> 1), this integral is dominated by the 


resonance peak in a narrow vicinity, | @— @| =| €| << @p of its resonance frequency, and we can take 
the relatively smooth function S-(@) out of the integral, thus reducing it to a table form:*9 
do d 
(x?) # 25 (04) |— ; ~2S (a) 6 - 
m (oR —@ *) + (ne) £(2me, éy +(70,) (7.150) 


=25,(0y) \; ‘i 7 EQ ae = Ba Da) 


“(2m /7)° +1 n@,) 2m nme, 


With the account of the FDT (134) and of Eq. (138), this gives>° 


Qe a coth a h coth Em : (7.151) 
nm, 2 2s T ~ Ima, 2k ,T 


48 If necessary, see CM Sec. 5.1. 

49 See, e.g., MA Eq. (6.5a). 

5° Note that this calculation remains correct even if the dissipation’s dispersion law deviates from the Ohmic 
model (138), provided that the drag coefficient 7 is replaced with its effective value Im7(@)/@p, because the 
effects of the environment are only felt, by the oscillator, at its oscillation frequency. 
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But this is exactly Eq. (48), which was derived in Sec. 2 from the Gibbs distribution, without any 
explicit account of the environment — though keeping it in mind by using the notion of the thermally- 
equilibrium ensemble.*! 


Notice that in the final form of Eq. (151) the coefficient 7, which characterizes the oscillator-to- 
environment interaction strength, has canceled! Does this mean that in Sec. 4 we toiled in vain? By no 
means. First of all, the result (150), augmented by the FDT (134), has an important conceptual value. 
For example, let us consider the low-temperature limit kg7 << ha@ where Eq. (151) is reduced to 


ho x 
ae oer mers (7.152) 


Let us ask a naive question: what exactly is the origin of this coordinate’s uncertainty? From the point of 
view of the usual quantum mechanics of absolutely closed (Hamiltonian) systems, there is no doubt: this 
non-vanishing variance of the coordinate is the result of the final spatial extension of the ground-state 
wavefunction (2.275), reflecting Heisenberg’s uncertainty relation — which in turn results from the fact 
that the operators of coordinate and momentum do not commute. However, from the point of view of the 
Heisenberg-Langevin equation (145), the variance (152) is an inalienable part of the oscillator’s 
response to the fluctuation force F(t) exerted by the environment at frequencies @ ¥ @. Though it is 
impossible to refute the former, absolutely legitimate point of view, in many applications it is easier to 
subscribe to the latter standpoint and treat the coordinate’s uncertainty as the result of the so-called 
quantum noise of the environment, which, in equilibrium, obeys the FTD (134). This notion has 
received numerous confirmations in experiments that did not include any oscillators with their own 
frequencies @ close to the noise measurement frequency @.°*2 


The second advantage of the Heisenberg-Langevin approach is that it is possible to use Eq. 
(148) to calculate the (experimentally measurable!) distribution S\(@), i.e. decompose the fluctuations 
into their spectral components. This procedure is not restricted to the limit of small 7 (i.e. of large Q); 
for any damping, we may just plug the FDT (134) into Eq. (148). For example, let us have a look at the 
so-called quantum diffusion. A free 1D particle, moving in a viscous medium providing it with the 
Ohmic damping (137), may be considered as the particular case of a 1D harmonic oscillator (145), but 
with @ = 0, so that combining Eqs. (134) and (149), we get 


(x2) =2f Ss nae 2 —— ho ho 453) 


(ma)? +(nay (mo a +(no) Qn -2k,T 


This integral has two divergences. The first one, of the type Jda/@’ at the lower limit, is just a 
classical effect: according to Eq. (85), the particle’s displacement variance grows with time, so it cannot 
have a finite time-independent value that Eq. (153) tries to calculate. However, we still can use that 
result to single out the quantum effects on diffusion — say, by comparing it with a similar but purely 
classical case. These effects are prominent at high frequencies, especially if the quantum noise 
overcomes the thermal noise before the dynamic cut-off, 1.e. if 


5! By the way, the simplest way to calculate S;(@), i.e. to derive the FDT, is to require that Eqs. (48) and (150) 
give the same result for an oscillator with any eigenfrequency @. This is exactly the approach used by H. Nyquist 
(for the classical case) — see also SM Sec. 5.5. 

52 See, for example, R. Koch et al., Phys. Lev. B 26, 74 (1982). 
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<<. (7.154) 
h m 


In this case, there is a broad range of frequencies where the quantum noise gives a substantial 
contribution to the integral: 


(7.155) 


Formally, this contribution diverges at either m — 0 or T — 0, but this logarithmic (i.e. extremely weak) 
divergence is readily quenched by almost any change of the environment model at very high 
frequencies, where the “Ohmic” approximation (136) becomes unrealistic. 


The Heisenberg-Langevin approach is very powerful, because its straightforward generalizations 
enable analyses of fluctuations in virtually arbitrary linear systems, i.e. the systems described by linear 
differential (or integro-differential) equations of motion, including those with many degrees of freedom, 
and distributed systems (continua), and such systems prevail in many fields of physics. However, this 
approach also its limitations. The main of them is that if the equations of motion of the Heisenberg 
operators are not linear, there is no linear relation, such as Eq. (146), between the Fourier images of the 
generalized forces and the generalized coordinates, and as the result, there is no simple relation, such as 
Eq. (148), between their spectral densities. In other words, if the Heisenberg equations of motion are 
nonlinear, there is no regular simple way to use them to calculate the statistical properties of the 
observables. 


For example, let us return to the dephasing problem described by Eqs. (68)-(70), and assume that 
the deterministic and fluctuating parts of the effective force —f exerted by the environment, are 
characterized by relations similar, respectively, to Eqs. (124) and (134). Now writing the Heisenberg 
equations of motion for the two remaining spin operators, and using the commutation relations between 
them, we get 


a -— é,,ff|]-—[6..(. +7 )6.|= -=6,(. +f)= -= 3 (c. ae +f), (7.156) 


and a similar equation for G,. Such nonlinear equations cannot be used to calculate the statistical 
properties of the Pauli operators in this system exactly — at least analytically. 


For some calculations, this problem may be circumvented by /inearization: if we are only 
interested in small fluctuations of the observables, their nonlinear Heisenberg equations of motion, such 
as Eq. (156), may be linearized with respect to small deviations of the operators about their (generally, 
time-dependent) deterministic “values”, and then the resulting linear equations for the operator 
variations may be solved either as has been demonstrated above, or (if the deterministic “values” evolve 
in time) using their Fourier expansions. Sometimes such approach gives relatively simple and important 
results,>? but for many other problems, this approach is insufficient, leaving a lot of space for alternative 
methods. 


53 For example, the formula used for processing the experimental results by R. Koch et al. (mentioned above), had 
been derived in this way. (This derivation will be suggested to the reader as an exercise.) 
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7.6. Density matrix approach 


The main alternative approach to the dynamics of open quantum systems, which is essentially a 
generalization of the one discussed in Sec. 2, is to extract the final results of interest from the dynamics 
of the density operator of our system s. Let us discuss this approach in detail.54 


We already know that the density matrix allows the calculation of the expectation value of any 
observable of the system s — see Eq. (5). However, our initial recipe (6) for the density matrix element 
calculation, which requires the knowledge of the exact state (2) of the whole Universe, is not too 
practicable, while the von Neumann equation (66) for the density matrix evolution is limited to cases in 
which probabilities W; of the system states are fixed — thus excluding such important effects as the 
energy relaxation. However, such effects may be analyzed using a different assumption — that the system 
of interest interacts only with a local environment that is very close to its thermally-equilibrium state 
described, in the stationary-state basis, by a diagonal density matrix with the elements (24). 


This calculation is facilitated by the following general observation. Let us number the basis 
states of the full local system (the system of our interest plus its local environment) by /, and use Eq. (5) 


to write 
(4)=Tr(4n,) = DA = Dlelaley(e 


where w, is the density operator of this local system. At a weak interaction between the system s and the 


Hae (7.157) 


local environment e, their states reside in different Hilbert spaces, so that we can write 
|?) =|s,)®]e,), (7.158) 


and if the observable A depends only on the coordinates of the system s of our interest, we may reduce 
Eq. (157) to the form similar to Eq. (5): 


(A) = DACACTCE A 5.) ® fey (ep | @(s, i, | i) ®le) 


Sisk (7.159) 


Sa sy [xe (e, irle.)@]s,) = Tr, (AW), 
ii 


we > (e, |W, |e,) =Tr,w,, (7.160) 
k 


where 


showing how exactly the density operator w of the system s may be calculated from w,. 


Now comes the key physical assumption of this approach: since we may select the local 
environment e to be much larger than the system s of our interest, we may consider the composite 
system / as a Hamiltonian one, with time-independent probabilities of its stationary states, so that for the 
description of the evolution in time of its full density operator w, (again, in contrast to that, w, of the 
system of our interest) we may use the von Neumann equation (66). Partitioning its right-hand side in 
accordance with Eq. (68), we get: 


itw, = lA. [4 lA. [4 [A,,.. il (7.161) 


54 As in Sec. 4, the reader not interested in the derivation of the basic equation (181) of the density matrix 
evolution may immediately jump to the discussion of this equation and its applications. 
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which 


would yield, for the evolution of w, a non-vanishing contribution due to the interaction. For that, Eq. 
(161) is not very convenient, because its right-hand side contains two other terms, of a much larger scale 
than the interaction Hamiltonian. To mitigate this technical difficulty, the interaction picture that was 
discussed at the end of Sec. 4.6, is very natural. (It is not necessary though, and I will use this picture 
mostly as an exercise of its application — unfortunately, the only example I can afford in this course.) 


The next step is to use the perturbation theory to solve this equation in the lowest order in H 


int ? 


As a reminder, in that picture (whose entities will be marked with index “I’, with the unmarked 
operators assumed to be in the Schrédinger picture), both the operators and the state vectors (and hence 
the density operator) depend on time. However, the time evolution of the operator of any observable A is 
described by an equation similar to Eq. (67), but with the unperturbed part of the Hamiltonian only — see 
Eq. (4.214). In model (68), this means 


ind, = |4,,47|. (7.162) 
where the unperturbed Hamiltonian consists of two parts defined in different Hilbert spaces: 
H,=H,+H,. (7.163) 


On the other hand, the state vector’s dynamics is governed by the interaction evolution operator 1, that 
obeys Eqs. (4.215). Since this equation, using the interaction-picture Hamiltonian (4.216), 


nA 


H, =i H,.tiy, (7.164) 


int 


is absolutely similar to the ordinary Schrédinger equation using the full Hamiltonian, we may repeat all 
arguments given at the beginning of Sec. 3 to prove that the dynamics of the density operator in the 
interaction picture of a Hamiltonian system is governed by the following analog of the von Neumann 
equation (66): 


inw, = lA, i), (7.165) 
where the index / is dropped for the notation simplicity. Since this equation is similar in structure (with 


the opposite sign) to the Heisenberg equation (67), we may use the solution Eq. (4.190) of the latter 
equation to write its analog: 


Ww, (t) =a, (¢,0)%, (Oya! (4,0). (7.166) 


It is also straightforward to verify that in this picture, the expectation value of any observable A may be 
found from an expression similar to the basic Eq. (5): 


(4) =Tr(4,%,), (7.167) 


showing again that the interaction and Schrédinger pictures give the same final results. 


In the most frequent case of factorable interaction (90),°> Eq. (162) is simplified for both 
operators participating in that product — for each one in its own way. In particular, for A=8,it yields 


55 A similar analysis of a more general case, when the interaction with the environment has to be represented as a 
sum of products of the type (90), may be found, for example, in the monograph by K. Blum, Density Matrix 
Theory and Applications, 3" ed., Springer, 2012. 
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ink, =[<,,47,]=[+,,47,]+[2,.47, |. (7.168) 


Since the coordinate operator is defined in the Hilbert space of our system s, it commutes with the 
Hamiltonian of the environment, so that we finally get 


ink, =|%,,7,]. (7.169) 


On the other hand, if 4=F’, this operator is defined in the Hilbert space of the environment, and 
commutes with the Hamiltonian of the unperturbed systems. As a result, we get 


ink, =|, (7.170) 


This means that with our time-independent unperturbed Hamiltonians, H , and H .» the time 


evolution of the interaction-picture operators is rather simple. In particular, the analogy between Eq. 
(170) and Eq. (93) allows us to immediately write the following analog of Eq. (94): 


F (t)= exp| + Fa} F(Oexp|-L Alt} ‘ (7.171) 
so that in the stationary-state basis n of the environment, 
- j EE, 
(A, Ja. (t) = exp oF th (0) exp|- 5,1} = oa (0) exp aoa ; ? (7. 172) 


and similarly (but in the basis of the stationary states of system s) for operator x. As a result, the right- 
hand side of Eq. (164) may be also factored: 


A,(0) =a (0) ii (0) = exp| (4,4 Ah if exo} ‘(4 fi.) 


= {eros “Als exp|- Lah exo “Halo exp - ial) =-3,(0F,(). 


So, the transfer to the interaction picture has taken some time, but now it enables a smooth ride.>° 
Indeed, just as in Sec. 4, we may rewrite Eq. (165) in the integral form: 


(7.173) 


i 


P 1 fla 7 
H)=— [On Oa’; (7.174) 
plugging this result into the right-hand side of Eq. (165), we get 
: 1 fly - i se ere are . 
n=-—5 14, O14, @.%,0]]ar=—— [koro.borFeiellar, 0.175) 


where, for the notation’s brevity, from this point on I will strip the operators x and F of their index 
“1”. (I hope their time dependence indicates the interaction picture clearly enough.) 


56 If we used either the Schrédinger or the Heisenberg picture instead, the forthcoming Eq. (175) would pick up a 
rather annoying multitude of fast-oscillating exponents, of different time arguments, on its right-hand side. 
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So far, this equation is exact (and cannot be solved analytically), but this is a good time to notice 
that even if we approximate the density operator on its right-hand side by its unperturbed, factorable 
“value” (corresponding to no interaction between the system s and its thermally-equilibrium 
environment e),>7 

W(t)> We), with (e, 


e,) =W,0 


a (7.176) 


WwW. 


where e,, are the stationary states of the environment and W,, are the Gibbs probabilities (24), Eq. (175) 
still describes nontrivial time evolution of the density operator. This is exactly the first non-vanishing 
approximation (in the weak interaction) we have been looking for. Now using Eq. (160), we find the 
equation of evolution of the density operator of the system of our interest: 


w(t) = “ss [tr, ro, R(t") F(t’), vite ye, fae", (7.177) 


2 


where the trace is over the stationary states of the environment. To spell out the right-hand side of Eq. 
(177), note again that the coordinate and force operators commute with each other (but not with 
themselves at different time moments!) and hence may be swapped at will, so that we may write 


Tr, fos feewed] = 2008) HEVTs, POLO, |- OHM RO YTY, FOWA)] 
—s(r)ie Or, Fea, FO} MeO ROM, br, AF] 
= (OCHS Far (Fon Ws — MOMS Fe (We Foal) 
— UDA OD Fan Wo Fan + HEROD M Fant Eon 


Since the summation over both indices n and n’ in this expression is over the same energy level set (of 
all stationary states of the environment), we may swap these indices in any of the sums. Doing this only 
in the terms including the factors W,,, we turn them into W,,, so that this factor becomes common: 


Try bebo = 2M ORO VF (Ful) COMO VF Fn) 


| 2 WDE, CF, () + HOROF,, (ME, (O) 


Ose (ero FO} sieyleenn- Bio} 


(7.178) 


(7.179) 


Now using Eq. (172), we get 


E(t-t’) 


2s > sin eo) Le(0.4e(e. 90}. (7.180) 


=> Wy, 


n,n' 


‘cee 


re 


OLE) Hel] + UM, 


Comparing the two double sums participating in this expression with Eqs. (108) and (111), we see that 
they are nothing else than, respectively, the symmetrized correlation function and the temporal Green’s 


57 For the notation simplicity, the fact that here (and in all following formulas) the density operator w of the 
system s of our interest is taken in the interaction picture, is just implied. 
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function (multiplied by #/2) of the time-difference argument 7 = t— ¢’ = 0. As the result, Eq. (177) takes 
a compact form: 


w(t) = = [K, (¢—2’)[x(0),[2(¢'), W(t")] Jae = | G(t —t')[X(2), {2(¢’), Wt)} Jat’. | (7.181) 


Let me hope that the readers (especially the ones who have braved through this derivation) enjoy 
this beautiful result as much as I do. It gives an equation for the time evolution of the density operator of 
the system of our interest (s), with the effects of its environment represented only by two real, c-number 
functions of t: one (Kf) describing the fluctuation force exerted by the environment, and the other one 
(G) representing its ensemble-averaged environment’s response to the system’s evolution. And most 
spectacularly, these are exactly the same functions that participate in the alternative, Heisenberg- 
Langevin approach to the problem, and hence related to each other by the fluctuation-dissipation 
theorem (134). 


After a short celebration, let us acknowledge that Eq. (181) is still an integro-differential 
equation, and needs to be solved together with Eq. (169) for the system coordinate’s evolution. Such 
equations do not allow explicit analytical solutions, besides a few very simple (and not very interesting) 
cases. For most applications, further simplifications should be made. One of them is based on the fact 
(which was already discussed in Sec. 3) that both environmental functions participating in Eq. (181) 
tend to zero when their argument t becomes much larger than the environment’s correlation time Zt, 
independent of the system-to-environment coupling strength. If the coupling is sufficiently weak, the 
time scales T,,,, of the evolution of the density matrix elements, following from Eq. (181), are much 
longer than this correlation time, and also the characteristic time scale of the coordinate operator’s 
evolution. In this limit, all arguments ¢’ of the density operator, giving substantial contributions to the 
right-hand side of Eq. (181), are so close to ¢ that it does not matter whether its argument is ¢’ or just ¢. 
This simplification, w(t’) w(t), is known as the Markov approximation.>® 


However, this approximation alone is still insufficient for finding the general solution of Eq. 
(181). Substantial further progress is possible in two important cases. The most important of them is 
when the intrinsic Hamiltonian H , of the system s of our of interest does not depend on time explicitly 


and has a discrete eigenenergy spectrum £,,,°? with well-separated levels: 


|E, -E,: 


h 
2 TAB2 
Le 
Let us see what does this condition yield for Eq. (181), rewritten for the matrix elements in the 
stationary state basis, in the Markov approximation: 


58 Named after Andrey Andreyevich Markov (1856-1922; in older Western literature, “Markoff”’), a 
mathematician famous for his general theory of the so-called Markov processes, whose future development is 
completely determined by its present state, but not its pre-history. 

59 Here, rather reluctantly, I will use this standard notation, E,,, for the eigenenergies of our system of interest (s), 
in hope that the reader would not confuse these discrete energy levels with the quasi-continuous energy levels of 
its environment (e), participating in particular in Eqs. (108) and (111). As a reminder, by this stage of our 
calculations, the environment levels have disappeared from our formulas, leaving behind their functionals K-{ 7) 
and G( 7). 
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t 


; 1 eoacita ‘ if : ‘ ‘ 
eT | K,.(t-t'\X(),[£(¢), WI J, dt" — | G(t—t')[X), {e(0’), wt], dt’. (7.183) 
After spelling out the commutators, the right-hand side of this expression includes four operator 
products, which differ “only” by the operator order. Let us first have a look at one of these products, 


[exw], = Dias, (t)X pam (L)Wnint 5 (7.184) 
where the indices m and m’ run over the same set of stationary states of the system s of our interest as 
the indices n and n’. According to Eq. (169) with a time-independent H;,, the matrix elements x,,,, (in the 
stationary state basis) oscillate in time as exp {i@ t}, so that 


[x(c)x(¢" yw], = Dnt Xo nii _expi{i i(o,,,t + On wat RW avg ; (7.185) 
where on the right-hand side, the coordinate matrix elements are in the Schrédinger picture, and the 
usual notation (6.85) is used for the quantum transition frequencies: 


ho,, =E,-E,. (7.186) 


According to the condition (182), frequencies @, with n # n’ are much higher than the speed of 
evolution of the density matrix elements (in the interaction picture!) — on both the left-hand and right- 
hand sides of Eq. (183). Hence, on the right-hand side of Eq. (183), we may keep only the terms that do 
not oscillate with these frequencies @,,, because rapidly-oscillating terms would give negligible 
contributions to the density matrix dynamics.°? For that, in the double sum (185) we should save only 
the terms proportional to the difference (¢ — t’) because they will give (after the integration over t’) a 
slowly changing contribution to the right-hand side.°! These terms should have @im + @nm’ = 90, 1.€. (En — 
Em) + (Em — Em’) = En — Em’ = 0. For a non-degenerate energy spectrum, this requirement means m’ = n; 
as a result, the double sum is reduced to a single one: 


[FMRC 7], = PW, it 2 Xm Xinn expt iO, att % t')} - 


*expfia,,(t—t’)}. (7.187) 


Xx 


nm 


Another product, [wx(e')X(t)] 
absolutely similarly, giving 


which appears on the right-hand side of Eq. (183), may be simplified 


nn'? 


a * expfi ©, (t' — tw, - (7.188) 


n'm 


[Wee], ~ > 


m 


These expressions hold whether n and n’ are equal or not. The situation is different for two other 
products on the right-hand side of Eq. (183), with w sandwiched between x(f) and x(t’). For example, 


[Ra we(0')] Ves = Dm )Waam'X mn’ (¢t') = > ae Xin'n' expt (Orn t+ Onn! t } 7 (7.189) 


m,m' 


60 This is essentially the same rotating-wave approximation (RWA) as was used in Sec. 6.5. 

6! As was already discussed in Sec. 4, the lower-limit substitution (t’ = —c) in the integrals participating in Eq. 
(183) gives zero, due to the finite-time “memory” of the system, expressed by the decay of the correlation and 
response functions at large values of the time delay r= t-t’. 
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For this term, the same requirement of having a fast oscillating function of (¢—¢’) only, yields a different 
condition: @ym + @n'n' = 0, Le. 
(E, —E,,)+(E,,-E,,)=0. (7.190) 


Here the double sum’s reduction is possible only if we make an additional assumption that all interlevel 
energy distances are unique, i.e. our system of interest has no equidistant levels (such as in the harmonic 
oscillator). For the diagonal elements (1 = n’), the RWA requirement is reduced to m = m’, giving sums 
over all diagonal elements of the density matrix: 


Ore], =>. 


m 


Xin|_expfia,,, (t—t"hw,,, - (7.191) 


nm 


(Another similar term, [x(t w(0)],,. is just a complex conjugate of (191).) However, for off-diagonal 


matrix elements (n # n’), the situation is different: Eq. (190) may be satisfied only if m=n and also m’ 
=n’, so that the double sum is reduced to just one, non-oscillating term: 


[E(C)WR(t") |. = yg Way X for n#n'. (7.192) 


The second similar term, [z(t wx(t)]., , 1s exactly the same, so that in one of the integrals of Eq. (183), 
these terms add up, while in the second one, they cancel. 


This is why the final equations of evolution look differently for diagonal and off-diagonal 
elements of the density matrix. For the former case (7 =n’), Eq. (183) is reduced to the so-called master 
equation®™ relating diagonal elements w,,, of the density matrix, i.e. the energy level occupancies W,,: 


ue = » Xin ‘W|-a t\W, = expt Om t} + exp{- i@,,T}) 
m#n 0 
re G(rX We W ) (expii Opn T} — exp{- dt) oe 


where rt = ¢ — ¢’. Changing the summation index notation from m to n’, we may rewrite the master 
equation in its canonical form 


(7.193) 


DI 


where the coefficients 


= K, (z)cos OT — G(r) sin ot | (7.195) 


are called the interlevel transition rates.®* Eq. (194) has a very clear physical meaning of the level 
occupancy dynamics (i.e. the balance of the probability flows [W) due to quantum transitions between 


62 The master equations, first introduced to quantum mechanics in 1928 by W. Pauli, are sometimes called the 
“Pauli master equations”, or “kinetic equations”, or “rate equations”. 

63 As Eq. (193) shows, the term with m = n would vanish and thus may be legitimately excluded from the sum. 

64 As Eq. (193) shows, the result for T’,.,,: is described by Eq. (195) as well, provided that the indices n and n’ are 
swapped in all components of its right-hand side, including the swap @n’ > @y'n = —Qin’- 
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the energy levels (see Fig. 7), in our current case caused by the interaction between the system of our 
interest and its environment. 


Fig. 7.7. Probability flows in a discrete- 
spectrum system. Solid arrows: the 
exchange between the two energy levels, n 
and n’, described by one term in the master 
equation (194); dashed arrows: other 
transitions to/from these two levels. 


The Fourier transforms (113) and (123) enable us to express the two integrals in Eq. (195) via, 
respectively, the symmetrized spectral density S@) of environment force fluctuations and the 
imaginary part v’(@) of the generalized susceptibility, both at frequency @= @,,. After that we may use 
the fluctuation-dissipation theorem (134) to exclude the former function, getting finally® 


Transition r et 2 h x H"(Om ) 
rates via T mn exp{(E, = E.,)/k,T}-1 


xo) 


(7.196) 


Note that since the imaginary part vy” of the generalized susceptibility is an odd function of 
frequency, Eq. (196) is in compliance with the Gibbs distribution for arbitrary temperature. Indeed, 
according to this equation, the ratio of the “up” and “down” rates for each pair of levels equals 


Fam XL") / Ln) = ex 2st (7.197) 


Toy exp{(E, —Ey)/kgT}-1/ exp{(E,, —E,)/kgT}-1 — kT 


n>n' 


On the other hand, according to the Gibbs distribution (24), in thermal equilibrium the level populations 
should be in the same proportion. Hence, Eq. (196) complies with the so-called detailed balance 
equation, 


Detailed = 

ae WD az = Wd ae (7.198) 
valid in the equilibrium for each pair {n, n’}, so that all right-hand sides of all Eqs. (194), and hence the 
time derivatives of all W,, vanish — as they should. Thus, the stationary solution of the master equations 
indeed describes the thermal equilibrium correctly. 


The system of master equations (194), frequently complemented by additional terms on their 
right-hand sides, describing interlevel transitions due to other factors (e.g., by an external ac force with a 
frequency close to one of @n’), is the key starting point for practical analyses of many quantum systems, 
notably including optical quantum amplifiers and generators (lasers). It is important to remember that 


65 It is straightforward (and highly recommended to the reader) to show that at low temperatures (kgT << | E,— 
E,]), Eq. (196) gives the same result as the Golden Rate formula (6.111), with 4 = x. (The low-temperature 
condition ensures that the initial occupancy of the excited level n is negligible, as was assumed at the derivation 
of Eq. (6.111).) 
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they are strictly valid only in the rotating-wave approximation, 1.e. if Eq. (182) is well satisfied for all n 
and n’ of substance. 


For a particular but very important case of a two-level system (with, say, FE; > £2), the rate '\_,. 
may be interpreted (especially in the low-temperature limit kgT << haz = FE; — Ex, when 1,42 >> 21 ) 
as the reciprocal characteristic time 1/7; = I'\42 of the energy relaxation process that brings the 


diagonal elements of the density matrix to their thermally-equilibrium values (24). For the Ohmic 
dissipation described by Eqs. (137)-(138), Eq. (196) yields 


ha, for kT <<ha,, 


(7.199) 
kT, for h,, << kT. 


2 
=T. bsal'nx 


a 
T, 


This relaxation time 7, should not be confused with the characteristic time 7, of the off-diagonal 
element decay, i.e. dephasing, which was already discussed in Sec. 3. In this context, let us see what do 
Eqs. (183) have to say about the dephasing rates. Taking into account our intermediate results (187)- 
(192), and merging the non-oscillating components (with m = n and m = n’) of the sums Eq. (187) and 
(188) with the terms (192), which also do not oscillate in time, we get the following equation: 


Wan’ = j J, K;, (x) » Xn ; exp 1O,,7} zs yx * exp{- iO, mth + (x mn - i} 
0 m#n m#n' 
+ Jot : exp{-ia, 8) trp forn #n'. 


2 

expa,,,7 
2h mén Pi “ I py 

In contrast with Eq. (194), the right-hand side of this equation includes both a real and an imaginary 

part, and hence it may be represented as 


x, ‘nm 


(7.200) 
Xx 


nam 


x 


n'm 


Wy = —(L/T 4 tiAs Wages (7.201) 


nn' 


where both factors 1/7; and A,» are real. As Eq. (201) shows, the second term in the right-hand side of 

this equation causes slow oscillations of the matrix elements w,,,, which, after returning to the 

Schrédinger picture, add just small corrections®’ to the unperturbed frequencies (186) of their 

oscillations, and are not important for most applications. More important is the first term, proportional to 
x 


1 c| 1 
a. =f are O(D -_ ; COS O,,,7 + > | 


m#én m#n' 


n'm 


2 2 
cosa,,,7 +(x,, — Xan) 


66 Sometimes Eq. (200) (in any of its numerous alternative forms) is called the Redfield equation, after the 1965 
work by A. Redfield. Note, however, that in the mid-1960s several other authors, notably including (in the 
alphabetical order) H. Haken, W. Lamb, M. Lax, W. Louisell, and M. Scully, also made major contributions to 
the very fast development of the density-matrix approach to open quantum systems. 

67 Such corrections are sometimes called Lamb shifts, due to their conceptual similarity to the genuine Lamb shift 
— the effect first observed experimentally in 1947 by Willis Lamb and Robert Retherford: a minor difference 
between energy levels of the 2s and 2p states of hydrogen, due to the electric-dipole coupling of hydrogen atoms 
to the free-space electromagnetic environment. (These energies are equal not only in the non-relativistic theory 
described in Sec. 3.6 but also in the relativistic theory (see Secs. 6.3, 9.7), if the electromagnetic environment is 
ignored.) The explanation of the Lamb shift by H. Bethe, in the same 1947, essentially launched the whole field of 
quantum electrodynamics — to be briefly discussed in Chapter 9. 
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mn * sina |e forn#n', (7.202) 


because it describes the effect completely absent without the environment coupling: exponential decay 
of the off-diagonal matrix elements, i.e. the dephasing. Comparing the first two terms of Eq. (202) with 
Eq. (195), we see that the dephasing rates may be described by a very simple formula: 


1 1 
TT. {0 + pa 7 or Cm TX iin! y oe (0) 


Ts m#n m#n' 


(7.203) 


1 kT 
~ {20 a ae 7 re 11 (Xp, =X) forn #n ? 


m#n m#n' 


where the low-frequency drag coefficient 77 is again defined as lim, o07’(@)/@— see Eq. (138). 


This result shows that two effects yield independent contributions to the dephasing. The first of 
them may be interpreted as a result of “virtual” transitions of the system, from the levels n and n’ of our 
interest, to other energy levels m; according to Eq. (195), this contribution is proportional to the 
strength of coupling to the environment at relatively high frequencies @, and @)m. (If the energy 
quanta h@ of these frequencies are much larger than the thermal fluctuation scale kg7, then only the 
lower levels, with E,,, < max[E,, E,] are important.) On the contrary, the second contribution is due to 
low-frequency, essentially classical fluctuations of the environment, and hence to the low-frequency 
dissipative susceptibility. In the Ohmic dissipation case, when the ratio 7 = y’(@)/@ is frequency- 
independent, both contributions are of the same order, but their exact relation depends on the matrix 
elements x,’ of a particular system. 


For example, returning for a minute to the two-level system discussed in Sec. 3, described by our 
current theory with the replacement *->6,, the high-frequency contributions to dephasing vanish 
because of the absence of transitions between energy levels, while the low-frequency contribution yields 


1 1 kyl 


i. ae 


kT Ak T 
1 (Xp, — Xp) > rs 1 [(c. Ig (0, Vy i = rs UE (7.204) 
thus exactly reproducing the result (142) of the Heisenberg-Langevin approach.®* Note also that the 
expression for 7> is very close in structure to Eq. (199) for 7; (in the high-temperature limit). However, 
for the simple interaction model (70) that was explored in Sec. 3, the off-diagonal elements of the 
operator x=o,, in the stationary-state z-basis vanish, so that 7; > 0, while 7> says finite. The physics 


of this result is very clear, for example, for the two-well implementation of the model (see Fig. 4 and its 
discussion): it is suitable for the case of a very high energy barrier between the wells, which inhibits 
tunneling, and hence any change of the well occupancies. However, 7; may become finite, and 
comparable with 7, if tunneling between the wells is substantial.® 


68 The first form of Eq. (203), as well as the analysis of Sec. 3, implies that low-frequency fluctuations of any 
other origin, not taken into account in own current analysis (say, an unintentional noise from experimental 
equipment), may also contribute to dephasing; such “technical fluctuations” are indeed a very serious challenge 
for the experimental implementation of coherent qubit systems — see Sec. 8.5 below. 

69 As was discussed in Sec. 5.1, the tunneling may be described by using, instead of Eq. (70), the full two-level 
Hamiltonian (5.3). Let me leave for the reader’s exercise to spell out the equations for the time evolution of the 
density matrix elements of this system, and of the expectation values of the Pauli operators, for this case. 
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Because of the reason explained above, the derivation of Eqs. (200)-(204) is not valid for 
systems with equidistant energy spectra — for example, the harmonic oscillator. For this particular, but 
very important system, with its simple matrix elements x, given by Eqs. (5.92), it is longish but 
straightforward to repeat the above calculations, starting from (183), to obtain an equation similar in 
structure to Eq. (200), but with two other terms, proportional to Wy+t,.+1, on its right-hand side. 
Neglecting the minor Lamb-shift term, the equation reads 


[(n, + 1\(n + n') +n, (n t+n'+ 2)}w,, 
=-§ . (7.205) 
—2(n, + 1[(n + 1Xn' +1)? w 2n,(nn')'* w 


nn' 


n+1,n'+1 ~ n-l,n'-1 


Here dis the effective damping coefficient,” 
2 


In 7a, 
= Imz(o)= ins) 


(7.206) 
2mQ, 


equal to just 77/2m for the Ohmic dissipation, and n, is the equilibrium number of oscillator’s excitations, 
given by Eq. (26b), with the environment’s temperature 7. (I am using this new notation because in 
dynamics, the instant expectation value (n) may be time-dependent, and is generally different from its 
equilibrium value n-.) 


As a remark: the derivation of Eq. (205) might be started at a bit earlier point, from the Markov 
approximation applied to Eq. (181), expressing the coordinate operator via the creation-annihilation 
operators (5.65). This procedure gives the result in the operator (i.e. basis-independent) form:7! 


w=-6 n +i {ata,a}-2anat a n{ {ato} 2a) (7.207) 


In the Fock state basis, this equation immediately reduces to Eq. (205); however, Eq. (207) may be more 
convenient for some applications. 


Returning to Eq. (205), we see that it relates only the elements w,,, located at the same distance 
(n —n’) from the principal diagonal of the density matrix. This means, in particular, that the dynamics of 
the diagonal elements w,,, of the matrix, i.e. the Fock state probabilities W,,, is independent of the off- 
diagonal elements, and may be represented in the form (194), truncated to the transitions between the 
adjacent energy levels only (n’=n + 1): 


70 This coefficient participates prominently in the classical theory of damped oscillations (see, e.g., CM Sec. 5.1), 
in particular defining the oscillator’s Q-factor as OQ = @/206, and the decay time of the amplitude A and the energy 
E of free oscillations: A(t) = A(O)exp{-d¢}, E() = E(O)exp{-267}. 

71 Sometimes Eq. (207) is called the Lindblad equation, but I believe this terminology is inappropriate. It is true 
that its structure falls into a general category of equations, suggested by G. Lindblad in 1976 for the density 
operators in the Markov approximation, whose diagonalized form in the interaction picture is 


Sy bi,oi-f,2) 
b= Dy, Lee kt - (El wi). 
J 


However, Eq. (207) was derived much earlier (by L. Landau in 1927 for zero temperature, and by M. Lax in 1960 
for an arbitrary temperature), and in contrast to the general Lindblad equation, spells out the participating 


operators L , and coefficients ¥ for a particular physical system — the harmonic oscillator. 
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W, = CF ea Wiss me Tea W,, )+ (WV, or 3 Visuals ), (7.208) 
with the following rates: 
Dicom = 26(n +10, +1), Dygyay = 26(n+1)n,, (7.209) 
Dison = 2OMN,, [1 = 26n(n, +1). 
Since according to the definition of n., given by Eq. (26b), 
h vA 
: ,  sothat n, +1= exp Oy /keT} : , (7.210) 


aa exp{ia, /k,T}-1 exp{ia,/k,Tt-1  — exp{-ha,/k,T}-1 


taking into account Eqs. (5.92), (186), (206), and the asymmetry of the function v”(@), we see that these 
rates are again described by Eq. (196), even though the last formula was derived for non-equidistant 
energy spectra. 


Hence the only substantial new feature of the master equation for the harmonic oscillator, is that 
the decay of the off-diagonal elements of its density matrix is scaled by the same parameter (20) as that 
of the decay of its diagonal elements, i.e. there is no radical difference between the dephasing and 
energy-relaxation times 77 and 7\. This fact may be interpreted as the result of the independence of the 
energy level distances, /i@, of the fluctuations F(t) exerted on the oscillator by the environment, so that 
their low-frequency density, S-(0), does not contribute to the dephasing. (This fact formally follows also 
from Eq. (203) as well, taking into account that for the oscillator, Xpn = Xn’ = 0.) 


The simple equidistant structure of the oscillator’s spectrum makes it possible to readily solve 
the system of Eqs. (208), with n = 0, 1, 2, ..., for some important cases. In particular, if the initial state 
of the oscillator is a classical mixture, with no off-diagonal elements, its further relaxation proceeds as 
such a mixture: Wy, (t) = 0 for all n’ # n.” In particular, it is straightforward to use Eq. (208) to verify 
that if the initial classical mixture obeys the Gibbs distribution (25), but with a temperature 7; different 
from that of the environment (7.), then the relaxation process is reduced to a simple exponential 
transient of the effective temperature from 7; to T.: 


ho ho : —26t —26t 
W.(0) =e n—— fh es 0 |} with T,,(t)=Te~* +7,\l-e*® J, (7.211) 
0) aA : te) 


with the corresponding evolution of the expectation value of the full energy E — cf. Eq. (26b): 


(E\(t)= he thanlno, (N= = mn x Gyr Pe (7.212) 


However, if the initial state of the oscillator is different (say, corresponds to some upper Fock 
state), the relaxation process, described by Eqs. (208)-(209), is more complex — see, e.g., Fig. 8. At low 
temperatures (Fig. 8a), it may be interpreted as a gradual “roll” of the probability distribution down the 
energy staircase, with a gradually decreasing velocity dn/dt « n. However, at substantial temperatures, 


72 Note, however, that this is not true for many applications, in which a damped oscillator is also under the effect 
of an external time-dependent field, which may be described by additional, typically off-diagonal terms on the 
right-hand side of Eqs. (205). 
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with kgT ~ha@ (Fig. 8b), this “roll-down” is saturated when the level occupancies W,(¢) approach their 
equilibrium values (25).73 


ll ——n =0 
ho 
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Fig. 7.8. Relaxation of a harmonic oscillator, initially in its 5" Fock state, at: (a) T= 0, and (b) T'> 0. Note 
that in the latter case, even the energy levels with n > 5 get populated, due to their thermal excitation. 


The analysis of this process may be simplified in the case when W(n, t) = W,(t) is a smooth 
function of the energy level number 7, limited to high levels: n >> 1. In this limit, we may use the 
Taylor expansion of this function (written for the points An = +1), truncated to three leading terms: 

1 2 
W,..(t)=W(nt1,t)% te das LA (7.213) 
~ On 2 On 
Plugging this expression into Eqs. (208)-(209), we get for the function W(n, f) a partial differential 
equation, which may be recast in the following form: 
OW _ 


2 
- 2 ew ]+ Slaw with f(n)=26(n,-n), d(n)=26(n,+%)n. (7.214) 

Ot On On 
Since at n >> 1, the oscillator’s energy EF is close to h@pn, this energy diffusion equation (sometimes 
incorrectly called the Fokker-Planck equation — see below) essentially describes the time evolution of 


the continuous probability density w(E, t), which may be defined as w(E, t)= W(E/ha@, t/ha.” 


73 The reader may like to have a look at the results of nice measurements of such functions W,(t) in microwave 
oscillators, performed using their coupling with Josephson-junction circuits: H. Wang et al., Phys. Rev. Lett. 101, 
240401 (2008), and with Rydberg atoms: M. Brune et al., Phys. Rev. Lett. 101, 240402 (2008). 

74 Tn the classical limit n, >> 1, Eq. (214) is analytically solvable for any initial conditions — see, e.g., the paper by 
B. Zeldovich et al., Sov. Phys. JETP 28, 308 (1969), which also gives some more intricate solutions of Eqs. 
(208)-(209). Note, however, that the most important properties of the damped harmonic oscillator (including its 
relaxation dynamics) may be analyzed simpler by using the Heisenberg-Langevin approach discussed in the 
previous section. 
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This continuous approximation naturally reminds us of the need to discuss dissipative systems 
with a continuous spectrum. Unfortunately, for such systems the few (relatively :-) simple results that 
may be obtained from the basic Eq. (181), are essentially classical in nature and are discussed in detail 
in the SM part of this series. Here, I will give only a simple illustration. Let us consider a 1D particle 
that interacts weakly with a thermally-equilibrium environment, but otherwise is free to move along the 
x-axis. AS we know from Chapters 2 and 5, in this case, the most convenient basis is that of the 
momentum eigenstates p. In the momentum representation, the density matrix is just the c-number 
function w(p, p’), defined by Eq. (54), which was already discussed in brief in Sec. 2. On the other hand, 
the coordinate operator, which participates in the right-hand side of Eq. (181), has the form given by the 
first of Eqs. (4.269), 


er ee (7.215) 


op 
dual to the coordinate-representation formula (4.268). As we already know, such operators are local — 
see, e.g., Eq. (4.244). Due to this locality, the whole right-hand side of Eq. (181) is local as well, and 
hence (within the framework of our perturbative treatment) the interaction with the environment affects 
only the diagonal values w(p, p) of the density matrix, i.e. the momentum probability density w(p). 


Let us find the equation governing the evolution of this function in time in the Markov 
approximation, when the time scale of the density matrix evolution is much longer than the correlation 
time 7, of the environment, i.e. the time scale of the functions K-(7) and G(z). In this approximation, we 
may take the matrix elements out of the first integral of Eq. “ 


ze [Kel t—rar 2,8), 00] gel Kel) r)dz [%,[<, 7] 
- (7.216) 
es w]J=- © 1 ea 


and calculate the last double commutator in the Schrédinger picture. This may be done either using an 
explicit expression for the matrix elements of the coordinate operator or in a simpler way — using the 
same trick as at the derivation of the Ehrenfest theorem in Sec. 5.2. Namely, expanding an arbitrary 
function f(p) into the Taylor series in p, 


f (Pp) = pee f (7.217) 


and using Eq. (215), we can prove the following simple commutation relation: 


= 1 af a san a rae = Chm ce = a 
lz f]= apt p‘ |= Yap )= Ds Te op aes ( ) 


Now applying this result sequentially, first to w and then to the resulting commutator, we get 


ae na, OW re) Ow > O’w 
= h— |=ih—| ih — |=-h ; 21 
[%, [oes w]| E ap ap ( ap - ap 5 (7.219) 


It may look like the second integral in Eq. (181) might be simplified similarly. However, it 
vanishes at p’ > p, and t’ > ¢, so that to calculate the first non-vanishing contribution from that integral 
for p = p’, we have to take into account the small difference r= t— t’ ~ 7, between the arguments of the 
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coordinate operators under that integral. This may be done using Eq. (169) with the free-particle’s 
Hamiltonian consisting of the kinetic-energy contribution alone: 


. im n2 ry 
2(0")— 2(t) » -ré = ro 2,H, |= ras a |- cP, (7.220) 
1 


where the exact argument of the operator on the right-hand side is already unimportant and may be 
taken for ¢. As a result, we may use the last of Eqs. (136) to reduce the second term on the right-hand 
side of Eq. (181) to 


reas —t' |x x(t’), w(t" ig Df ~J2 al |_2le[b ol 
ap LO t'\[<(t), {x(t"), w(t')} Jat oe a f2. I 2) {2 oh] (7.221) 


In the momentum representation, the momentum operator and the density matrix w are just c-numbers 
and commute, so that, applying Eq. (218) to the product pw, we get 


{2 | [222 =a 2( Py) (7.222) 
m4} | m | op \m 


with F =n? (7.223) 


m 


This is the 1D form of the famous Fokker-Planck equation describing the classical statistics of 
motion of a particle (in our particular case, of a free particle) in an environment providing a linear drag 
characterized by the coefficient 77; it belongs to the same drift-diffusion type as Eq. (214). The first, drift 
term on its right-hand side describes the particle’s deceleration due to the drag force (137), F = —np/m = 
—nv, provided by the environment. The second, diffusion term on the right-hand side of Eq. (223) 
describes the effect of fluctuations: the particle’s momentum’ random walk around its average (drift- 
affected, and hence time-dependent) value. The walk obeys the law similar to Eq. (85), but with the 
momentum-space diffusion coefficient 


D, =nk,T. (7.224) 


This is the reciprocal-space version of the fundamental Einstein relation between the dissipation 
(friction) and fluctuations, in this classical limit represented by their thermal energy scale kgT7.”> 


Just for the reader’s reference, let me note that the Fokker-Planck equation (223) may be readily 
generalized to the 3D motion of a particle under the effect of an additional external force,’® and in this 


7 Note that Eq. (224), as well as the original Einstein’s relation between the diffusion coefficient D in the direct 
space and temperature, may be derived much simpler by other means — for example, from the Nyquist formula 
(139). These issues are discussed in detail in SM Chapter 5S. 

76 Moreover, Eq. (223) may be generalized to the motion of a quantum particle in an additional periodic potential 
U(r). In this case, due to the band structure of the energy spectrum (which was discussed in Secs. 2.7 and 3.4), 
the coupling to the environment produces not only a continuous drift-diffusion of the probability density in the 
space of the quasimomentum fiq but also quantum transitions between different energy bands at the same hq — 
see, e.g., K. Likharev and A. Zorin, J. Low Temp. Phys. 59, 347 (1985). 
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more general form is the basis for many important applications; however, due to its classical character, 
its discussion is also left for the SM part of this series.”” 


To summarize our discussion of the two alternative approaches to the analysis of quantum 
systems interacting with a thermally-equilibrium environment, described in the last three sections, let 
me emphasize again that they give different descriptions of the same phenomena, and are characterized 
by the same two functions G(z) and K-(t). Namely, in the Heisenberg-Langevin approach, we describe 
the system by operators that change (fluctuate) in time, even in the thermal equilibrium, while in the 
density-matrix approach, the system is described by non-fluctuating probability functions, such as W,,(?) 
or w(p, #), which are stationary in equilibrium. In the cases when a problem may be solved analytically 
to the end by both methods (for example, for a harmonic oscillator), they give identical results. 


7.8. Exercise problems 


7.1. Calculate the density matrix of a two-level system whose Hamiltonian is described, in a 
certain basis, by the following matrix: 


H=¢-6=c,0, +C,0, +C,0,, 


where o, are the Pauli matrices and c;are c-numbers, in thermodynamic equilibrium at temperature 7. 


7.2. In the usual z-basis, spell out the density matrix of a spin-’2 with gyromagnetic ratio 7. 


(1) in the pure state with the spin definitely directed along the z-axis, 

(11) in the pure state with the spin definitely directed along the x-axis, 

(111) in thermal equilibrium at temperature 7, in a magnetic field directed along the z-axis, and 
(iv) in thermal equilibrium at temperature 7, in a magnetic field directed along the x-axis. 


7.3. Calculate the Wigner function of a harmonic oscillator in: 


(i) in thermodynamic equilibrium at temperature 7, 
(ii) in the ground state, and 
(ii) in the Glauber state with dimensionless complex amplitude a. 


Discuss the relation between the first of the results and the Gibbs distribution. 


74. Calculate the Wigner function of a harmonic oscillator, with mass m and frequency @p, in its 
first excited stationary state (n = 1). 
7.5. A harmonic oscillator is weakly coupled to an Ohmic environment. 


(i) Use the rotating-wave approximation to write the reduced equations of motion for the 
Heisenberg operators of the complex amplitude of oscillations. 


77 See SM Secs. 5.6-5.7. For a more detailed discussion of quantum effects in dissipative systems with continuous 
spectra see, e. g., either U. Weiss, Quantum Dissipative Systems, 2"' ed., World Scientific, 1999, or H.-P. Breuer 
and F. Petruccione, The Theory of Open Quantum Systems, Oxford U. Press, 2007. 
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(11) Calculate the expectation values of the correlators of the fluctuation force operators 
participating in these equations, and express them via the average number (n) of thermally-induced 
excitations in equilibrium, given by the second of Eqs. (26b). 


7.6. Calculate the average potential energy of long-range electrostatic interaction between two 
similar isotropic, 3D harmonic oscillators, each with the electric dipole moment d = gs, where s is the 
oscillator’s displacement from its equilibrium position, at arbitrary temperature T. 


7.7. A semi-infinite string with mass yw per unit length is attached to a wall and stretched with a 
constant force (tension) 4 Calculate the spectral density of the transverse force exerted on the wall, in 
thermal equilibrium at temperature 7. 


7.8. Calculate the low-frequency spectral density of small fluctuations of the voltage V across a 
Josephson junction, shunted with an Ohmic conductor, and biased with a dc external current J > i. 


Hint: You may use Eqs. (1.73)-(1.74) to describe the junction’s dynamics, and assume that the 
shunting conductor remains in thermal equilibrium. 


7.9. Prove that in the interaction picture of quantum dynamics, the expectation value of an 
arbitrary observable A may be indeed calculated using Eq. (167). 


7.10. Show that the quantum-mechanical Golden Rule (6.149) and the master equation (196) 
give the same results for the rate of spontaneous quantum transitions n’ > n in a system with a discrete 
energy spectrum, weakly coupled to a low-temperature heat bath (with kgT << h@’). 


Hint: You may start by establishing a relation between the function y’’(@nn’), which participates 
in Eq. (196), and the density of states p,, which participates in the Golden Rule formula, using the 
particular case of sinusoidal classical oscillations in the system of interest. 


7.11. For a harmonic oscillator with weak Ohmic dissipation, use Eqs. (208)-(209) to find the 
time evolution of the expectation value (£) of oscillator’s energy for an arbitrary initial state, and 
compare the result with that following from the Heisenberg-Langevin approach. 


7.12. Derive Eq. (219) in an alternative way, using an expression dual to Eq. (4.244). 


7.13. A particle in a system of two coupled potential wells (see, e.g., Fig. 7.4 in the lecture notes) 
is weakly coupled to an Ohmic environment. 


(i) Derive equations describing the time evolution of the density matrix elements. 

(ii) Solve these equations in the low-temperature limit, when the energy level splitting is much 
larger than kg7, to calculate the time evolution of the probability of finding the particle in one of the 
wells, after it had been placed there at t= 0. 


7.14." A spin-% with gyromagnetic ratio y is placed into the magnetic field Bt)=B+ B (t) 
with an arbitrary but relatively small time-dependent component, and is also weakly coupled to a 
dissipative environment. Derive differential equations describing the time evolution of the expectation 


values of spin’s Cartesian components, at arbitrary temperature. 


Chapter 7 Page 50 of 50 


Distinguish- 
able 
particles 


Essential Graduate Physics QM: Quantum Mechanics 


Chapter 8. Multiparticle Systems 


This chapter provides a brief introduction to quantum mechanics of systems of similar particles, with 
special attention to the case when they are indistinguishable. For such systems, theory predicts (and 
experiment confirms) very specific effects even in the case of negligible explicit (“direct”) interactions 
between the particles. These effects notably include the Bose-Einstein condensation of bosons and the 
exchange interaction of fermions. 


8.1. Distinguishable and indistinguishable particles 


The importance of quantum systems of many similar particles is probably self-evident; just the 
very fact that most atoms include several/many electrons is sufficient to attract our attention. There are 
also important systems where the total number of electrons is much higher than in one atom; for 
example, a cubic centimeter of a typical metal houses ~10 conduction electrons that cannot be 
attributed to particular atoms, and have to be considered as common parts of the system as the whole. 
Though quantum mechanics offers virtually no exact analytical results for systems of substantially 
interacting particles,! it reveals very important new quantum effects even in the simplest cases when 
particles do not interact, and least explicitly (directly). 


If non-interacting particles are either different from each other by their nature, or physically 
similar but still distinguishable because of other reasons, everything is simple — at least, conceptually. 
Then, as was already discussed in Sec. 6.7, a system of two particles, | and 2, each in a pure quantum 
state, may be described by a state vector which is a direct product, 


|2)=|4), 818). (8.1a) 


of single-particle vectors, describing their states f and /’ defined in different Hilbert spaces. (Below, I 


will frequently use, for such direct product, the following convenient shorthand: 
la) =|68'), (8.1b) 
in which the particle’s number is coded by the state symbol’s position.) Hence the permuted state 
? |6B')=|6'B)=|B'), ®|B),. (8.2) 


where ? is the permutation operator defined by Eq. (2), is clearly different from the initial one. 


This operator may be also used for states of systems of identical particles. In physics, the last 
term may be used to describe: 


(i) the “really elementary” particles like electrons, which (at least at this stage of development of 
physics) are considered as structure-less entities, and hence are all identical; 


! As was emphasized in Sec. 7.3, for such systems of similar particles the powerful methods discussed in the last 
chapter, based on the separation of the whole Universe into a “system of our interest” and its “environment”, 
typically do not work well — mostly because the quantum state of the “particle of interest” may be substantially 
correlated (in particular, entangled) with those of similar particles forming its “environment” — see below. 
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(11) any objects (e.g., hadrons or mesons) that may be considered as a system of “more 
elementary” particles (e.g., quarks and gluons), but are placed in the same internal quantum state — most 
simply, though not necessarily, in the ground state.” 


It is important to note that identical particles still may be distinguishable — say by their clear 
spatial separation. Such systems of similar but distinguishable particles (or subsystems) are broadly 
discussed nowadays in the context of quantum computing and encryption — see Sec. 5 below. This is 
why it is insufficient to use the term “identical particles” if we want to say that they are genuinely 
indistinguishable, so I below I will use the latter term, despite it being rather unpleasant grammatically. 


It turns out that for a quantitative description of systems of indistinguishable particles we need to 
use, instead of direct products of the type (1), linear combinations of such products, for example of 
BB’) and |B’f).3 To see this, let us discuss the properties of the permutation operator defined by Eq. (2). 
Consider an observable A, and a system of eigenstates of its operator: 


a;)=A,la,). (8.3) 


If the particles are indistinguishable, the observable’s expectation value should not be affected by their 


Nn 


A 


permutation. Hence the operators A and ? have to commute and share their eigenstates. This is why 
the eigenstates of the operator ? are so important: in particular, they include the eigenstates of the 


Hamiltonian, i.e. the stationary states of a system of indistinguishable particles. 


Let us have a look at the action of the permutation operator squared, on an elementary ket-vector 
product: 


|B") = P|? | B"))= P| B'B) =|f8'), (8.4) 
i.e. P’ brings the state back to its original form. Since any pure state of a two-particle system may be 


represented as a linear combination of such products, this result does not depend on the state, and may 
be represented as the following operator relation: 


P? =f. (8.5) 
Now let us find the possible eigenvalues 4% of the permutation operator. Acting by both sides of Eq. (5) 
on any of eigenstates |a;) of the permutation operator, we get a very simple equation for its eigenvalues: 


Pri, (8.6) 


J 


2 Note that from this point of view, even complex atoms or molecules, in the same internal quantum state, may be 
considered on the same footing as the “really elementary” particles. For example, the already mentioned recent 
spectacular interference experiments by R. Lopes et al., which require particle identity, were carried out with 
couples of “He atoms in the same internal quantum state. 

3 A very legitimate question is why, in this situation, we need to introduce the particles’ numbers to start with. A 
partial answer is that in this approach, it is much simpler to derive (or guess) the system Hamiltonians from the 
correspondence principle — see, e.g., Eq. (27) below. Later in this chapter, we will discuss an alternative approach 
(the so-called “second quantization’), in which particle numbering is avoided. While that approach is more 
logical, writing adequate Hamiltonians (which, in particular, would avoid spurious self-interaction of the 
particles) within it is more challenging — see Sec. 3 below. 
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with two possible solutions: 
FP, =a, (8.7) 


J 


Let us find the eigenstates of the permutation operator in the simplest case when each of the 
component particles can be only in one of two single-particle states — say, @ and f’. Evidently, none of 
the simple products |G’) and |’/), taken alone, does qualify for the eigenstate — unless the states @ and 
P are identical. This is why let us try their linear combination 


|a;) = 4| 48") +5|8'2), (8.8) 
so that 


Pla,)=P,|a,) = a|B'B) +5| 6B’). (8.9) 


For the case & = +1 we have to require the states (8) and (9) to be the same, so that a = b, giving the so- 
called symmetric eigenstate‘ 


1 Uy 
Je.)- (8 )+|B'B)), (8.10) 


where the front coefficient guarantees the orthonormality of the two-particle state vectors, provided that 
the single-particle vectors are orthonormal. Similarly, for & = —1 we get a = —b, i.e. an antisymmetric 


eigenstate 


1 Uy ’ 
Ja.) = (| 68")-|6°8)). (8.11) 


These are the simplest (two-particle, two-state) examples of entangled states, defined as multiparticle 
system states whose vectors cannot be factored into a direct product (1) of single-particle vectors. 


So far, our math does not preclude either sign of 4, in particular the possibility that the sign 


would depend on the state (i.e. on the index /). Here, however, comes in a crucial fact: all 
indistinguishable particles fall into two groups: 5 


(1) bosons, particles with integer spin s, for whose states ® = +1, and 
(ii) fermions, particles with half-integer spin, with ® =—1. 


In the non-relativistic theory we are discussing now, this key fact should be considered as an 
experimental one. (The relativistic quantum theory, whose elements will be discussed in Chapter 9, 
offers proof that the half-integer-spin particles cannot be bosons and the integer-spin ones cannot be 
fermions.) However, our discussion of spin in Sec. 5.7 enables the following handwaving interpretation 
of the difference between these two particle species. In the free space, the permutation of particles 1 and 
2 may be viewed as a result of their pair’s common rotation by angle ¢= +z about a properly selected z- 


4 As in many situations we have met earlier, the kets given by Eqs. (10) and (11) may be multiplied by exp {ig} 
with an arbitrary real phase g. However, until we discuss coherent superpositions of various states @, there is no 
good motivation for taking the phase different from 0; that would only clutter the notation. 

5 Sometimes this fact is described as having two different “statistics”: the Bose-Einstein statistics of bosons and 
Fermi-Dirac statistics of fermions, because their statistical distributions in thermal equilibrium are indeed 
different — see, e.g., SM Sec. 2.8. However, this difference is actually deeper: we are dealing with two different 
quantum mechanics. 


Chapter 8 Page 3 of 52 


Essential Graduate Physics QM: Quantum Mechanics 


axis. As we have seen in Sec. 5.7, at the rotation by this angle, the state vector |) of a particle with a 
definite quantum number m, acquires an extra factor exp {tims}. As we know, the quantum number m, 
ranges from —s to +s, in unit steps. As a result, for bosons, with integer s, m, can take only integer 
values, so that exp{tim,z} = +1, so that the product of two such factors in the state product |G’) is 
equal to +1. On the contrary, for the fermions with their half-integer s, all m, are half-integer as well, so 
that exp {+im,7} = +i so that the product of two such factors in vector |G’) is equal to (+i)° =-1. 


The most impressive corollaries of Eqs. (10) and (11) are for the case when the partial states of 
the two particles are the same: { = f’. The corresponding Bose state a, defined by Eq. (10), is possible; 
in particular, at sufficiently low temperatures, a set of non-interacting Bose particles condenses on the 
ground state — the so-called Bose-Einstein condensate (“BEC’).° The most fascinating feature of the 
condensates is that their dynamics is governed by quantum mechanical laws, which may show up in the 
behavior of their observables with virtually no quantum uncertainties’ — see, e.g., Eqs. (1.73)-(1.74). 


On the other hand, if we take 8 = f’ in Eq. (11), we see that state a becomes the null-state, i.e. 
cannot exist at all. This is the mathematical expression of the Pauli exclusion principle:’ two 
indistinguishable fermions cannot be placed into the same quantum state. (As will be discussed below, 
this is true for systems with more than two fermions as well.) Probably, the key importance of this 
principle is self-evident: if it was not valid for electrons (that are fermions), all electrons of each atom 
would condense on in their ground (1s-like) state, and all the usual chemistry (and biochemistry, and 
biology, including dear us!) would not exist. The Pauli principle makes fermions implicitly interacting 
even if they do not interact directly, i.e. in the usual sense of this word. 


8.2. Singlets, triplets, and the exchange interaction 


Now let us discuss possible approaches to quantitative analyses of identical particles, starting 
from a simple case of two spin-’2 particles (say, electrons), whose explicit interaction with each other 
and the external world does not involve spin. The description of such a system may be based on 
factorable states with ket-vectors 


|a_) =|o,.) ®|s,2), (8.12) 


with the orbital state vector |oi2) and the spin vector |sj2) belonging to different Hilbert spaces. It is 
frequently convenient to use the coordinate representation of such a state, sometimes called the spinor: 


(r¥, |@_) = (1.82 |.) @|sp) =v Gy.) Sp). (8.13) 


Since the spin- particles are fermions, the particle permutation has to change the sign: 


6 For a quantitative discussion of the Bose-Einstein condensation see, e.g., SM Sec. 3.4. Examples of such 
condensates include superfluids like helium, Cooper-pair condensates in superconductors, and BECs of weakly 
interacting atoms. 

7 For example, for a coherent condensate of N >> 1 particles, Heisenberg’s uncertainty relation takes the form 
ox dp = &x{Nmv) = h/2, so that its coordinate x and velocity v may be measured simultaneously with much higher 
precision than those of a single particle. 

8 It was first formulated for electrons by Wolfgang Pauli in 1925, on the background of less general rules 
suggested by Gilbert Lewis (1916), Irving Langmuir (1919), Niels Bohr (1922), and Edmund Stoner (1924) for 
the explanation of experimental spectroscopic data. 
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Py (r,.¥5)) 512) = y(r,5¥,)|5>1) = -y(r,,¥,)|S\) > (8.14) 
of either the orbital factor or the spin factor. 
In particular, in the case of symmetric orbital factor, 


y(r,.¥,)=y(r,.r,), (8.15) 
the spin factor has to obey the relation 


S21) = —|S12). (8.16) 


Let us use the ordinary z-basis (where z, in the absence of an external magnetic field, is an arbitrary 
spatial axis) for both spins. In this basis, the ket-vector of any two spins-'/2 may be represented as a 
linear combination of the following four basis vectors: 


\**), WY), IN), and Rane (8.17) 


The first two kets evidently do not satisfy Eq. (16), and cannot participate in the state. Applying to the 
remaining kets the same argumentation as has resulted in Eq. (11), we get 


Singlet Is,.)=|s_)= alt) Ut) (8.18) 


state 
Such an orbital-symmetric and spin-antisymmetric state is called the singlet. 


The origin of this term becomes clear from the analysis of the opposite (orbital-antisymmetric 
and spin-symmetric) case: 


y(r,,¥,) =—-v(",02), sis) =| 84 ): (8.19) 


For the composition of such a symmetric spin state, the first two kets of Eq. (17) are completely 
acceptable (with arbitrary weights), and so is an entangled spin state that is the symmetric combination 
of the two last kets, similar to Eq. (10): 


\s,)= aa lt)+i)) (8.20) 


so that the general spin state is a triplet: 


al 
+ +C) = + ‘ , 
M)+eW) +6 (() Wr)) (8.21) 
Note that any such state (with any values of the coefficients c satisfying the normalization condition), 
corresponds to the same orbital wavefunction and hence the same energy. However, each of these three 
states has a specific value of the z-component of the net spin — evidently equal to, respectively, +h, —h, 
and 0. Because of this, even a small external magnetic field lifts their degeneracy, splitting the energy 
level in three; hence the term “triplet”. 


Triplet | 5 ) =x 
state 12 a 


In the particular case when the particles do not interact at all, for example 


a2 
A A 


H=h+h, h, =F +(e) with k =1,2, (8.22) 
m 
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the two-particle Schrédinger equation for the symmetrical orbital wavefunction (15) is obviously 
satisfied by the direct products, 


V0.1) =V, (CY (>) (8.23) 


of single-particle eigenfunctions, with arbitrary sets n, n’ of quantum numbers. For the particular but 
very important case n = n’, this means that the eigenenergy of the (only acceptable) singlet state, 


1 
—\N)-Nt lw, @)w,0), (8.24) 
sr ) | \) nW'l n\"2 

is just 26,, where &, is the single-particle energy level.’ In particular, for the ground state of the system, 
such singlet spin state gives the lowest energy E, = 2&, while any triplet spin state (19) would require 
one of the particles to be in a different orbital state, i.e. in a state of higher energy, so that the total 
energy of the system would be also higher. 


Now moving to the systems in which two indistinguishable spin-’2 particles do interact, let us 
consider, as their simplest but important!° example, the lower energy states of a neutral atom!! of heltum 
— more exactly, “He. Such an atom consists of a nucleus with two protons and two neutrons, with the 
total electric charge q = +2e, and two electrons “rotating” about the nucleus. Neglecting the small 
relativistic effects that were discussed in Sec. 6.3, the Hamiltonian describing the electron motion may 
be expressed as 


a2 
n aA aA na na Px 
H=h, +h, +U.,,5 h, =—- 9 

2m 47é,", 


2 2 
2e A e 


int 


= 2, 
47é,|r, —r,| ee) 


As with most problems of multiparticle quantum mechanics, the eigenvalue/eigenstate problem 
for this Hamiltonian does not have an exact analytical solution, so let us carry out its approximate 
analysis considering the electron-electron interaction U;,; as a perturbation. As was discussed in Chapter 
6, we have to start with the “O"-order” approximation in which the perturbation is ignored, so that the 
Hamiltonian is reduced to the sum (22). In this approximation, the ground state of the atom is the singlet 
(24), with the orbital factor 


YU 02) = Vio KW ioo (tr), (8.26) 


and energy 2é. Here each factor yoo(r) is the single-particle wavefunction of the ground (1s) state of 
the hydrogen-like atom with Z = 2, with quantum numbers n = 1, / = 0, and m = 0 — hence the 
wavefunctions’ indices. According to Eqs. (3.174) and (3.208), 

ey ae ee ier ee (8.27) 


e : oy 
lan 13”? Z 2 


and according to Eqs. (3.191) and (3.201), in this approximation the total ground state energy is 


Vigo (KT) = ve (9, P)K, (1) = 


° In this chapter, I try to use lower-case letters for all single-particle observables (in particular, ¢ for their 
energies), in order to distinguish them as clearly as possible from the system’s observables (including the total 
energy E of the system), which are typeset in capital letters. 

10 Indeed, helium makes up more than 20% of all “ordinary” matter of our Universe. 

\1 Note that the positive ion He"! of this atom, with just one electron, is fully described by the hydrogen-like atom 
theory with Z = 2, whose ground-state energy, according to Eq. (3.191), is -Z’Ey/2 = -2Ey,, ~ -55.4 eV. 
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LR 
Eo = Del = a{- *.| = { | =-4F,, ~-109 eV. (8.28) 
2n n=1,Z=2 2 72 


This is still somewhat far (though not terribly far!) from the experimental value E, ~ —78.8 eV — see the 
bottom level in Fig. 1a. 


AE (a) singlet state (52) (0) 
(eV) (“parahelium”) 2 
“parahelium”  “orthohelium” 
3s 3P 3d 35 3p 3d negro Bee ee ------ + 
2s 2p. 2p 
25, 


_ls_ (ground state) 
-25 ————- ———> 


Ein) TE 


nlm 


Fig. 8.1. The lower energy levels of a helium atom: (a) experimental data and (b) a schematic structure 
of an excited state in the first order of the perturbation theory. On panel (a), all energies are referred to 
that (-2Ey ~ —55.4 eV) of the ground state of the positive ion He"', so that their magnitudes are the 
(readily measurable) energies of the atom’s single ionization starting from the corresponding state of the 
neutral atom. Note that the “spin direction” nomenclature on panel (b) is rather crude: it does not reflect 
the difference between the entangled states s; and s-. 


Making a minor (but very useful) detour from our main topic, let us note that we can get a much 
better agreement with experiment by calculating the electron interaction energy in the 1* order of the 
perturbation theory. Indeed, in application to our system, Eq. (6.14) reads 


A * 
iz = (g Vine g) = fanfare, VY, (1) 51 Wine 9h) Wg N52): (8.29) 
Plugging in Eqs. (25)-(27), we get 
2 2 
2 
HOt | (ah [a ep et (8.30) 
AT r; ATE, Ir, = r,| ty 


As may be readily evaluated analytically (this exercise is left for the reader), this expression equals 
(5/4)Eu, so that the corrected ground state energy, 


~ FO dg) _ =e 
E, xB, +28, =(-44+5/4)E, =-74.8eV, (8.31) 
is much closer to experiment. 


There is still room here for a ready improvement, using the variational method discussed in Sec. 
2.9. For our particular case of the “He atom, we may try to use, as the trial state, the orbital 
wavefunction given by Eqs. (26)-(27), but with the atomic number Z considered as an adjustable 
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parameter Zep < Z = 2 rather than a fixed number. The physics behind this approach is that the electric 
charge density p(r) = -e|y(r)| of each electron forms a negatively charged “cloud” that reduces the 
effective charge of the nucleus, as seen by the other electron, to Z.,e, with some Zr < 2. As a result, the 
single-particle wavefunction spreads further in space (with the scale 79 = rp/Ze¢ > rp/Z), while keeping its 
functional form (27) nearly intact. Since the kinetic energy T in the system’s Hamiltonian (25) is 
proportional to He oC ae while the potential energy is proportional to roo Za we can write 


EZ)= (4s ) a “ Ue) 53 (8.32) 


Now we can use the fact that according to Eq. (3.212), for any stationary state of a hydrogen-like 
atom (just as for the classical circular motion in the Coulomb potential), (U) = 2E, and hence (7) = E — 
(U) =-E. Using Eq. (30), and adding the correction (31) to the potential energy, we get 


Z os j 5 Zo 
£,@.0-|4 5 +( +5) 5 fe (8.33) 


This expression allows an elementary calculation of the optimal value of Z.,, and the corresponding 
minimum of the function E,(Zer): 


5 
(Zor ope = ot -4 =1.6875, (E,)  »-2.85E, *-77.5eV. (8.34) 


Given the trial state’s crudeness, this number is in surprisingly good agreement with the experimental 
value cited above, with a difference of the order of 1%. 


Now let us return to the main topic of this section — the effects of particle (in this case, electron) 
indistinguishability. As we have just seen, the ground-level energy of the helium atom is not affected 
directly by this fact, but the situation is different for its excited states — even the lowest ones. The 
reasonably good precision of the perturbation theory, which we have seen for the ground state, tells us 
that we can base our analysis of wavefunctions (y) of the lowest excited state orbitals, on products like 
Yio0(¥k) Wain(rx’), with n > 1. To satisfy the fermion permutation rule, ® =—1, we have to take the orbital 
factor of the state in either the symmetric or the antisymmetric form: 


WY (t,.¥,) = 5 LY’ 100 (WY nim C2) = V im OY 100 2 ) , (8.35) 


with the proper total permutation asymmetry provided by the corresponding spin factor (18) or (21), so 
that the upper/lower sign in Eq. (35) corresponds to the singlet/triplet spin state. Let us calculate the 
expectation values of the total energy of the system in the first order of the perturbation theory. Plugging 
Eq. (35) into the 0"-order expression 


(E,)"" =[d*yJd’r, vi(r,.15) (, +h,)y, (r,.r,), (8.36) 


we get two groups of similar terms that differ only by the particle index. We can merge the terms of 
each pair by changing the notation as (r; > r, r2 > r’ ) in one of them, and (r; > r’, r2 > r) in the 
counterpart term. Using Eq. (25), and the mutual orthogonality of the wavefunctions Yoo(r) and Wain), 
we get the following result: 
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1 eed 2e? 
ndrelw (r r 
Jr ) [Vin ( i oni 4néyr' 
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(0) = r 
(E.) -[vinto| am 4 


with n>1. 


\q? ’ 
iz (r’) r (8.37) 


ME yt 


= Eig té 


nlm > 


It may be interpreted as the sum of eigenenergies of two separate single particles, one in the ground state 
100, and another in the excited state n/m — although actually the electron states are entangled. Thus, in 
the 0" order of the perturbation theory, the electron entanglement does not affect their energy. 


However, the potential energy of the system also includes the interaction term Ujin, which does 
not allow such separation. Indeed, in the 1“ approximation of the perturbation theory, the total energy E. 
of the system may be expressed as 100 + Enm + Ein’, with 


oe 
a = (Une) = fanfare, Yr. Vin )Y. 0), (8.38) 


Plugging Eq. (35) into this result, using the symmetry of the function Uin with respect to the particle 
number permutation, and the same particle coordinate re-numbering as above, we get 


jay eee ae ee (8.39) 


int 


with the following, deceivingly similar expressions for the two components of this sum/difference: 
* * 
= [rf Pr V0 CW nin EWU ine SEW 00 EY nin ("Ds (8.40) 


* * 
Ee = | d ‘ri A*r'W io (©)Y iin (r W sat (r, r 'W nim (K)W 190 (r ») . (8.41) 


Since the single-particle orbitals can be always made real, both components are positive — or at 
least non-negative. However, their physics and magnitude are different. The integral (40), called the 
direct interaction energy, allows a simple semi-classical interpretation as the Coulomb energy of 
interacting electrons, each distributed in space with the electric charge density p(r) =—ey*(r) W(r):!2 


r r’ 
E air = [arr] d’r' Pao Pan e = | Bigg (©) Onin (r)d’r = (2a (r)¢ (r)d*r, (8.42) 
A4né,| r=—F ine 
where @(r) are the electrostatic potentials created by the electron “charge clouds”: !3 
1 4 r’ 1 nim r’) 
hoo t)=—— [arr PO) 9 y= ar Pa (8.43) 
An, jr-r Ane, jr-r 


However, the integral (41), called the exchange interaction energy, evades a classical 
interpretation, and (as it is clear from its derivation) is the direct corollary of electrons’ 
indistinguishability. The magnitude of E.x is also very much different from Eyjr because the function 
under the integral (41) disappears in the regions where the single-particle wavefunctions yoo(r) and 
Wnin(t) do not overlap. This is in full agreement with the discussion in Sec. 1: if two particles are 
identical but well separated, i.e. their wavefunctions do not overlap, the exchange interaction disappears, 


12 See, e.g., EM Sec. 1.3, in particular Eq. (1.54). 

13 Note that the result for £4; correctly reflects the basic fact that a charged particle does not interact with itself, 
even if its wavefunction is quantum-mechanically spread over a finite space volume. Unfortunately, this is not 
true for some popular approximate theories of multiparticle systems — see Sec. 4 below. 
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i.e. measurable effects of particle indistinguishability vanish. (In contrast, the integral (40) decreases 
with the growing electron separation only slowly, due to the long-range Coulomb interaction.) 


Figure 1b shows the structure of an excited energy level, with certain quantum numbers n > 1, /, 
and m, given by Eqs. (39)-(41). The upper, so-called parahelium" level, with the energy 


Esti = (5 ag Exim + E gig + EB. > Ey + € (8.44) 


nlm? 


corresponds to the symmetric orbital state and hence to the singlet spin state (18), while the lower, 
orthohelium level, with 
Bigei = (E109 + Enin + E ai Big (8.45) 


ex para ? 
corresponds to the degenerate triplet spin state (21). 


This degeneracy may be lifted by an external magnetic field, whose effect on the electron spins!> 
is described by the following evident generalization of the Pauli Hamiltonian (4.163), 


Hea =-B,B-B,-B=-S-B, with y=y7, = - = 2, (8.46) 
m 


where 


S=§,+8,, (8.47) 
is the operator of the (vector) sum of the system of two spins.!¢ To analyze this effect, we need first to 
make one more detour, to address the general issue of spin addition. The main rule!’ here is that in a full 
analogy with the net spin of a single particle, defined by Eq. (5.170), the net spin operator (47) of any 
system of two spins, and its component S. along the (arbitrarily selected) z-axis, obey the same 
commutation relations (5.168) as the component operators, and hence have the properties similar to 
those expressed by Eqs. (5.169) and (5.175): 


An 


S?|S,M,)=nS(S+1)S,M;), 8, 


S,M;)="M,|S,M,), with -S <M, <+S, (8.48) 


where the ket vectors correspond to the coupled basis of joint eigenstates of the operators of S° and S; 
(but not necessarily all component operators — see again the Venn shown in Fig. 5.12 and its discussion, 
with the replacements 8S, L > s,2 and J — S). Repeating the discussion of Sec. 5.7 with these 
replacements, we see that in both coupled and uncoupled bases, the net magnetic number Ms is simply 
expressed via those of the components 


14 This terminology reflects the historic fact that the observation of two different hydrogen-like spectra, 
corresponding to the opposite signs in Eq. (39), was first taken as evidence for two different species of “He, which 
were called, respectively, the “orthohelium” and the “parahelium’’. 

'S As we know from Sec. 6.4, the field also affects the orbital motion of the electrons, so that the simple analysis 
based on Eq. (46) is strictly valid only for the s excited state (/ = 0, and hence m= 0). However, the orbital effects 
of a weak magnetic field do not affect the triplet level splitting we are analyzing now. 

16 Note that similarly to Eqs. (22) and (25), here the uppercase notation of the component spins is replaced with 
their lowercase notation, to avoid any possibility of their confusion with the total spin of the system. 

!7 Since we already know that the spin of a particle is physically nothing more than a (specific) part of its angular 
momentum, the similarity of the properties (48) of the sum (47) of spins of different particles to those of the sum 
(5.170) of different spin components of the same particle it very natural, but still has to be considered as a new 
fact — confirmed by a vast body of experimental data. 
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M, =(m,), +(m,),- (8.49) 


However, the net spin quantum number S (in contrast to the Nature-given spins sj) of its elementary 
components) is not universally definite, and we may immediately say only that it has to obey the 
following analog of the relation | /—s | <7 < (/ +s) discussed in Sec. 5.7: 


|s, —s,|SS Ss, +5. (8.50) 
What exactly S is (within these limits), depends on the spin state of the system. 


For the simplest case of two spin-’2 components, each with s = 2 and m, = +'4, Eq. (49) gives 
three possible values of Ms, equal to 0 and +1, while Eq. (50) limits the possible values of S to just either 
0 or 1. Using the last of Eqs. (48), we see that the possible combinations of the quantum numbers are 


S =0, S=1, 
and (8.51) 
M, =0, M,=0,+1. 


It is virtually evident that the singlet spin state s_ belongs to the first class, while the simple (separable) 
triplet states TT and WV belong to the second class, with Ms = +1 and Ms = —-1, respectively. However, 
for the entangled triplet state s,, evidently with Ms= 0, the value of S is less obvious. Perhaps the easiest 
way to recover it!8 to use the “rectangular diagram’, similar to that shown in Fig. 5.14, but redrawn for 
our case of two spins, i.e., with the replacements m; > (ms) = +'2, ms > (ms)2 = +'4 — see Fig. 2. 


Fig. 8.2. The “rectangular diagram” 
showing the relation between the 
uncoupled-representation states (dots) 
and the coupled-representation states 
(straight lines) of a system of two spins- 
4 — cf. Fig. 5.14. 


Just as at the addition of various angular momenta of a single particle, the top-right and bottom- 
left corners of this diagram correspond to the factorable triplet states T7 and LV, which participate in 
both the uncoupled-representation and coupled-representation bases, and have the largest value of S, i.e. 
1. However, the entangled states si, which are linear combinations of the uncoupled-representation 
states TL and vie cannot have the same value of S, so that for the triplet state s., S has to take the value 
different from that (0) of the singlet state, i.e. 1. With that, the first of Eqs. (48) gives the following 
expectation values for the square of the net spin operator: 


(8.52) 


s?) 7 2h’, for each triplet state, 
0, for the singlet state. 


'8 Another, a bit longer but perhaps a more prudent way is to directly calculate the expectation values of S? for 
the states s,, and then find S by comparing the results with the first of Eqs. (48); it is highly recommended to the 
reader as a useful exercise. 
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Note that for the entangled triplet state s,, whose ket-vector (20) is a linear superposition of two kets of 
states with opposite spins, this result is highly counter-intuitive, and shows how careful we should be 
interpreting entangled quantum states. (As will be discussed in Chapter 10, the entanglement brings 
even more surprises for quantum measurements.) 


Now we may return to the particular issue of the magnetic field effect on the triplet state of the 
“He atom. Directing the z-axis along the field, we may reduce Eq. (46) to 


A 


n 


A a 
ean = -7 SB = 2uU,P i . (8.53) 


Since all three triplet states (21) are eigenstates, in particular, of the operator a and hence of the 
Hamiltonian (53), we may use the second of Eqs. (48) to calculate their energy change simply as 


+1, for the factorable triplet state TT, 
AE... = 21, 3M -= 2,8 x 0, for the entangled triplet state s, , (8.54) 
=I, for the factorable triplet state LW . 


This splitting of the “orthohelium” level is schematically shown in Fig. 1b.!° 


8.3. Multiparticle systems 


Leaving several other problems on two-particle systems for the reader’s exercise, let me proceed 
to the discussion of systems with N > 2 indistinguishable particles, whose list notably includes atoms, 
molecules, and condensed-matter systems. In this case, Eq. (7) for fermions is generalized as 


Nn 


P,\a_.)=-la_), forall k,k’ =1,2....,N, (8.55) 


where the operator Pa permutes particles with numbers k and k’. As a result, for systems with non- 


directly-interacting fermions, the Pauli principle forbids any state in which any two particles have 
similar single-particle wavefunctions. Nevertheless, it permits two fermions to have similar orbital 
wavefunctions, provided that their spins are in the singlet state (18), because this satisfies the 
permutation requirement (55). This fact is of paramount importance for the ground state of the systems 
whose Hamiltonians do not depend on spin because it allows the fermions to be in their orbital single- 
particle ground states, with two electrons of the spin singlet sharing the same orbital state. Hence, for 
the limited (but very important!) goal of finding ground-state energies of multi-fermion systems with 
negligible direct interaction, we may ignore the actual singlet spin structure, and reduce the Pauli 


'9 It is interesting that another very important two-electron system, the hydrogen (H2) molecule, which was briefly 
discussed in Sec. 2.6, also has two similarly named forms, parahydrogen and orthohydrogen. However, their 
difference is due to two possible (respectively, singlet and triplet) states of the system of two spins of the two 
hydrogen nuclei — protons, which are also spin-’2 particles. The resulting energy of the parahydrogen is lower 
than that of the orthohydrogen by only ~45 meV per molecule — the difference comparable with kgT at room 
temperature (~26 meV). As a result, at the ambient conditions, the equilibrium ratio of these two spin isomers is 
close to 3:1. Curiously, the theoretical prediction of this minor effect by W. Heisenberg (together with F. Hund) in 
1927 was cited in his 1932 Nobel Prize award as the most noteworthy application of quantum theory. 
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exclusion principle to the simple picture of single-particle orbital energy levels, each “occupied with 
two fermions”. 


As a very simple example, let us find the ground energy of five fermions, confined in a hard- 
wall, cubic-shaped 3D volume of side a, ignoring their direct interaction. From Sec. 1.7, we know the 
single-particle energy spectrum of the system: 

242 


2 2 2 . _# 
AN Ey (n? on; +n?) with &, = ages and n,,n 


n, =1,2.... (8.56) 


y > 
so that the lowest-energy states are: 
- one ground state with {n,,1,,n-} = {1,1,1}, and energy €111= (17+1°+1°)e = 3, and 


- three excited states, with {n,,,,nz} equal to either {2,1,1}, or {1,2,1}, or {1,1,2}, with equal 
energies &211— €121 > €|12 = (2°+17+1°)&% =— 6. 


According to the above simple formulation of the Pauli principle, each of these orbital energy levels can 
accommodate up to two fermions. Hence the lowest-energy (ground) state of the five-fermion system is 
achieved by placing two of them on the ground level €1; = 3, and the remaining three particles, in any 
of the degenerate “excited” states of energy 6&, so that the ground-state energy of the system is 
12 242 
E, = 2x36, +3x66, = 24e, = ee (8.57) 


2 
ma 


Moreover, in many cases, relatively weak interaction between fermions does not blow up such a 
simple quantum state classification scheme qualitatively, and the Pauli principle allows tracing the order 
of single-particle state filling. This is exactly the simple approach that has been used in our discussion of 
atoms in Sec. 3.7. Unfortunately, it does not allow for a more specific characterization of the ground 
states of most atoms, in particular the evaluation of the corresponding values of the quantum numbers S, 
L, and J that characterize the net angular momenta of the atom, and hence its response to an external 
magnetic field. These numbers are defined by relations similar to Eqs. (48), each for the corresponding 
vector operator of the net angular momenta: 


Ml Nas a N . , NN. 
S=>38,, Led, Jedi: (8.58) 
k=l k=l k=1 


note that these definitions are consistent with Eq. (5.170) applied both to the angular momenta s;, l;, and 
jx of each particle, and to the full vectors S, L, and J. When the numbers S, Z, and J for a state are 
known, they are traditionally recorded in the form of the so-called Russell-Saunders symbols:?° 


see (8.59) 


where S and J are the corresponding values of these quantum numbers, while ¥ is a capital /etter, 
encoding the quantum number L — via the same spectroscopic notation as for single particles (see Sec. 
3.6): ¥ = SforlL=0, ¥ =P for L=1, ¥ =D for L = 2, etc. (The reason why the front superscript of 
the Russel-Saunders symbol lists 2S + 1 rather than just S, is that according to the last of Eqs. (48), it 


20 Named after H. Russell and F. Saunders, whose pioneering (circa 1925) processing of experimental spectral- 
line data has established the very idea of vector addition of the electron spins, described by the first of Eqs. (58). 
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shows the number of possible values of the quantum number Ms, which characterizes the state’s spin 
degeneracy, and is called its multiplicity.) 


For example, for the simplest, hydrogen atom (Z = 1), with its single electron in the ground 1s 
state, Ll =/]=0,S=s = '%, and J=S = "4, so that its Russell-Saunders symbol is 2S;/2. Next, the 
discussion of the helium atom (Z = 2) in the previous section has shown that in its ground state L = 0 
(because of the 1s orbital state of both electrons), and S = 0 (because of the singlet spin state), so that the 
total angular momentum also vanishes: J = 0. As a result, the Russell-Saunders symbol is 'So. The 
structure of the next atom, lithium (Z = 3) is also easy to predict, because, as was discussed in Sec. 3.7, 
its ground-state electron configuration is 1s72s', i.e. includes two electrons in the “helium shell”, i.e. on 
the 1s orbitals (now we know that they are actually in a singlet spin state), and one electron in the 2s 
state, of higher energy, also with zero orbital momentum, / = 0. As a result, the total Z in this state is 
evidently equal to 0, and S is equal to 2, so that J = '4, meaning that the Russell-Saunders symbol of 
lithium is *P\/2. Even in the next atom, beryllium (Z = 4), with the ground state configuration 1s°2s”, the 
symbol is readily predictable, because none of its electrons has non-zero orbital momentum, giving L = 
0. Also, each electron pair is in the singlet spin state, i.e. we have S' = 0, so that J = 0 — the quantum 
number set described by the Russell-Saunders symbol 'Sp — just as for helium. 


However, for the next, boron atom (Z = 5), with its ground-state electron configuration 1s72s*2p' 
(see, e.g., Fig. 3.24), there is no obvious way to predict the result. Indeed, this atom has two pairs of 
electrons, with opposite spins, on its two lowest s-orbitals, giving zero contributions to the net S, LZ, and 
J. Hence these total quantum numbers may be only contributed by the last, fifth electron with s = 2 and 
/=1, giving S = 4, L = 1. As was discussed in Sec. 5.7 for the single-particle case, the vector addition 
of the angular momenta S and L enables two values of the quantum number J: either L + S = */. or L—S 
= '4. Experiment shows that the difference between the energies of these two states is very small (~2 
meV), so that at room temperature (with kgT ~ 26 meV) they are both occupied, with the genuine 
ground state having J = %, so that its Russell-Saunders symbol is *P}). 


Such energy differences, which become larger for heavier atoms, are determined both by the 
Coulomb and spin-orbit?! interactions between the electrons. Their quantitative analysis is rather 
involved (see below), but the results tend to follow simple phenomenological Hund rules, with the 
following hierarchy: 


Rule 1. For a given electron configuration, the ground state has the /argest possible S, and hence 
the largest multiplicity 2S + 1. 


Rule 2. For a given S, the ground state has the /argest possible L. 


Rule 3. For given S and L, J has its smallest possible value, | L — S|, if the given sub-shell {n, /} 
is filled not more than by half, while in the opposite case, J has its /argest possible value, L + S. 


Let us see how these rules work for the boron atom we have just discussed. For it, the Hund 
Rules | and 2 are satisfied automatically, while the sub-shell {7 = 2, /= 1}, which can house up to 2x(2/ 
+ 1) = 6 electrons, is filled with just one 2p electron, 1.e. by less than a half of the maximum value. As a 
result, the Hund Rule 3 predicts the ground state’s value J = '4, in agreement with experiment. 


21 Tn light atoms, the spin-orbit interaction is so weak that it may be reasonably well described as an interaction of 
the total momenta L and S of the system — the so-called LS (or “Russell-Saunders”) coupling. On the other hand, 
in very heavy atoms, the interaction is effectively between the net momenta j, =]; + s; of the individual electrons 
— the so-called jj coupling. This is the reason why in such atoms the Hund rule 3 may be violated. 
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Generally, for lighter atoms, the Hund rules are well obeyed. However, the lower down the Hund rule 
hierarchy, the less “powerful” the rules are, i.e. the more often they are violated in heavier atoms. 


Now let us discuss possible approaches to a quantitative theory of multiparticle systems — not 
only atoms. As was discussed in Sec. 1, if fermions do not interact directly, the stationary states of the 
system have to be the antisymmetric eigenstates of the permutation operator, i.e. satisfy Eq. (55). To 
understand how such states may be formed from the single-electron ones, let us return for a minute to 
the case of two electrons, and rewrite Eq. (11) in the following compact form: 


state 1 state 2 


L 1 (8.60a) 
_1 AlN 2 i |B) |B‘) < particle number 1, 
le = (A) 81 \s )@|A) =< |2) |B ‘) < particle number 2, 


where the direct product signs are just implied. In this way, the Pauli principle is mapped on the well- 
known property of matrix determinants: if any of two columns of a matrix coincide, its determinant 
vanishes. This Slater determinant approach”? may be readily generalized to N fermions occupying any NV 
(not necessarily the lowest-energy) single-particle states J, 6’, B’’, etc: 


state list > 
[7) |B’) |B") , 
F particle 
|a_) = |_|) |6) |B" ... N list (8.60b) 


wy? [a) |e) Ia) PP 


N 


The Slater determinant form is extremely nice and compact — in comparison with direct writing 
of a sum of N! products, each of N ket factors. However, there are two major problems with using it for 
practical calculations: 


(i) For the calculation of any bra-ket product (say, within the perturbation theory) we still need to 
spell out each bra- and ket-vector as a sum of component terms. Even for a limited number of electrons 
(say N ~ 10° in a typical atom), the number N! ~ 10'® of terms in such a sum is impracticably large for 
any analytical or numerical calculation. 


(11) In the case of interacting fermions, the Slater determinant does not describe the eigenvectors 
of the system; rather the stationary state is a superposition of such basis functions, i.e. of the Slater 
determinants — each for a specific selection of N states from the full set of single-particle states — that is 
generally larger than N. 


For atoms and simple molecules, whose filled-shell electrons may be excluded from an explicit 
analysis (by describing their effects, approximately, with effective pseudo-potentials), the effective 
number N may be reduced to a smaller number Nr of the order of 10, so that Ner! < 10°, and the Slater 
determinants may be used for numerical calculations — for example, in the Hartree-Fock theory — see the 
next section. However, for condensed-matter systems, such as metals and semiconductors, with the 


22 It was suggested in 1929 by John C. Slater. 
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number of free electrons is of the order of 107° per cm’, this approach is generally unacceptable, though 
with some smart tricks (such as using the crystal’s periodicity) it may be still used for some approximate 
(also mostly numerical) calculations. 


These challenges make the development of a more general theory that would not use particle 
numbers (which are superficial for indistinguishable particles to start with) a must for getting any final 
analytical results for multiparticle systems. The most effective formalism for this purpose, which avoids 
particle numbering at all, is called the second quantization.?> Actually, we have already discussed a 
particular version of this formalism, for the case of the 1D harmonic oscillator, in Sec. 5.4. As a 
reminder, after the definition (5.65) of the “creation” and “annihilation” operators via those of the 
particle’s coordinate and momentum, we have derived their key properties (5.89), 


a\n)=n''*|n-1), a" |n) =(n41)'?|n +1), (8.61) 


where n are the stationary (Fock) states of the oscillator. This property allows an interpretation of the 
operators’ actions as the creation/annihilation of a single excitation with the energy Aa@ — thus justifying 
the operator names. In the next chapter, we will show that such excitation of an electromagnetic field 
mode may be interpreted as a massless boson with s = 1, called the photon. 


In order to generalize this approach to arbitrary bosons, not appealing to a specific system, we 
may use relations similar to Eq. (61) to define the creation and annihilation operators. The definitions 
look simple in the language of the so-called Dirac states, described by ket-vectors 


Nig Nat sce) (8.62) 


where N; is the state occupancy, i.e. the number of bosons in the single-particle state 7. Let me 
emphasize that here the indices 1, 2, ...j,... number single-particle states (including their spin parts) 
rather than particles. Thus the very notion of an individual particle’s number is completely (and for 
indistinguishable particles, very relevantly) absent from this formalism. Generally, the set of single- 
particle states participating in the Dirac state may be selected arbitrarily, provided that it is full and 
orthonormal in the sense 


(NNN js. [ NN ona js) = Oy ye Oy nr, ON NI (8.63) 


though for systems of non- (or weakly) interacting bosons, using the stationary states of individual 
particles in the system under analysis is almost always the best choice. 


Now we can define the particle annihilation operator as follows: 


Bi, |Nys No seeN jaro) = NG?|N,, Ns; — lau). (8.64) 


Note that the pre-ket coefficient, similar to that in the first of Eqs. (61), guarantees that any attempt to 
annihilate a particle in an initially unpopulated state gives the non-existing (“null”) state: 


Nig Np500 2) =O; (8.65) 


a 


23 It was invented (first for photons and then for arbitrary bosons) by P. Dirac in 1927, and then modified in 1928 
for fermions by E. Wigner and P. Jordan. Note that the term “second quantization” is rather misleading for the 
non-relativistic applications we are discussing here, but finds certain justification in the quantum field theory. 
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where the symbol 0; means zero occupancy of the j'" state. According to Eq. (63), an equivalent way to 
write Eq. (64) is 


NigN ast N as 


nN 1/2 
a; MN Ne NG On ON DN’, ON',.N jl (8.66) 


7 
According to the general Eq. (4.65), the matrix element of the Hermitian conjugate operator al is 


A Nc N soeN aS Neng Nye 


* 
CN, Nessa pec BN Notes N get) 


=(N Nev Njool(Ni) [Nae N hen) = (Ni) "Oy Sv, Ov va 8.67) 
=(N, +1)” Sy 9 Oy, wis ON vt 
meaning that 


AN, No Nye) =(N, +1)?|N, Noo, +h...) (8.68) 


in total compliance with the second of Eqs. (61). In particular, this particle creation operator allows the 
description of the generation of a single particle from the vacuum (not null!) state |0, 0, ...): 


11030 240) 200 = 


a; 


O,Onssliea0), (8.69) 
and hence a product of such operators may create, from the vacuum, a multiparticle state with an 


arbitrary set of occupancies: 74 


atat at atat at 


GE, edly Cay vos ca 
~__-_-—_’ 


0, 0,2.) = (AGN!) | By Nos) (8.70) 


N,times JN, times 


Next, combining Eqs. (64) and (68), we get 


Ns NagciN yee.) SN | NEN tee N 5c); (8.71) 


ata 
a | 


so that, just as for the particular case of harmonic oscillator excitations, the operator 


A 


N,=4'4, (8.72) 


J 
“counts” the number of particles in the j" single-particle state, while preserving the whole multiparticle 
state. Acting on a state by the creation-annihilation operators in the reverse order, we get 


TN, Nason Moe) =(N, + 1)]N Nass Vo) (8.73) 


a ja; 


Eqs. (71) and (73) show that for any state of a multiparticle system (which may be represented as a 
linear superposition of Dirac states with all possible sets of numbers N,), we may write 


aa -ala, = [a,.at =f, (8.74) 


24 The resulting Dirac state is not an eigenstate of every multiparticle Hamiltonian. However, we will see below 
that for a set of non-interacting particles it is a stationary state, so that the full set of such states may be used as a 
good basis in perturbation theories of systems of weakly interacting particles. 
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again in agreement with what we had for the 1D oscillator — cf. Eq. (5.68). According to Eqs. (63), (64), 
and (68), the creation and annihilation operators corresponding to different single-particle states do 
commute, so that Eq. (74) may be generalized as 


(8.75) 


(8.76) 


As was mentioned earlier, a major challenge in the Dirac approach is to rewrite the Hamiltonian 
of a multiparticle system, that naturally carries particle numbers k (see, e.g., Eq. (22) for A = 1, 2), in the 
second quantization language, in which there are no these numbers. Let us start with single-particle 
components of such Hamiltonians, i.e. operators of the type 


(8.77) 


where all N operators f , are similar, besides that each of them acts on one specific (k*) particle, and NV 


is the total number of particles in the system, which is evidently equal to the sum of single-particle state 
occupancies: 


N=>N,. (8.78) 


The most important examples of such operators are the kinetic energy of N similar single particles, and 
their potential energy in an external field: 


a2 
Pay Pe, Gada) (8.79) 


For bosons, instead of the Slater determinant (60), we have to write a similar expression, but 
without the sign alternation at permutations: 


NI... jf.) , 
Nol jo) =[ MERE) > -A8'8"). (8.80) 
N operands 


sometimes called the permanent. Note again that the left-hand side of this relation is written in the Dirac 
notation (that does not use particle numbering), while on its right-hand side, just in relations of Secs. 1 
and 2, the particle numbers are coded with the positions of the single-particle states inside the state 
vectors, and the summation is over all different permutations of the states in the ket — cf. Eq. (10). 
(According to the basic combinatorics,*> there are N!/(N;!...N;!...) such permutations, so that the front 
coefficient in Eq. (80) ensures the normalization of the Dirac state, provided that the single-particle 
states £, BP’, ...are normalized.) Let us use Eq. (80) to spell out the following matrix element for a 
system with (N -1) particles: 


25 See, e.g., MA Eq. (2.3). 
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(oon jgeouW pp — gece] Flea —yeeeW pres) 
NUN, —D!..N. — Dh... N- 
= ee i » » (...BB'B"... yh, 
(N-1)! P(N-1| P| N-1) k=l 


where all non-specified occupation numbers in the corresponding positions of the bra- and ket-vectors 


(8.81) 


Ppp". 


are equal to each other. Each single-particle operator /, participating in the operator sum, acts on the 


bra- and ket-vectors of states only in one (k'*) position, giving the following result, independent of the 
position number: 


5; 


Ii 4; a kb position 


in k" position = (B, \/|4,) = fy: (8.82) 
Since in both permutation sets participating in Eq. (81), with (NV — 1) state vectors each, all positions are 
equivalent, we can fix the position (say, take the first one) and replace the sum over k with the 
multiplication by of the bracket by (NV — 1). The fraction of permutations with the necessary bra-vector 
(with number /) in that position is Nj/(N — 1), while that with the necessary ket-vector (with number /’) 
in the same position is N;/(N — 1). As the result, the permutation sum in Eq. (81) reduces to 


N, Nj. ran 
ND Wat Pe, pa OP a 


s/c eee 8 (8.83) 


where our specific position k is now excluded from both the bra- and ket-vector permutations. Each of 
these permutations now includes only (N; — 1) states 7 and (N; — 1) states 7’, so that, using the state 
orthonormality, we finally arrive at a very simple result: 


(Ny yoN py — 1 [P |My — LN poe) 


N!..(N, -D!..(V, — DI... 


So (N,N, ae — Nj rs (N -2)! 


a? (8.84) 
(N -1)! W- I NAN = DEAN jp He 
=(N, N, aoe 
On the other hand, let us calculate matrix elements of the following operator: 
Pe Gis (8.85) 
it 
A direct application of Eqs. (64) and (68) shows that the only non-vanishing of the elements are 
(LN pM lye fpl a |.N, —Leo Ny) = (NN)? fp (8.86) 


But this is exactly the last form of Eq. (84), so that in the basis of Dirac states, the operator (77) may be 
represented as 


(8.87) 


This beautifully simple relation is the key formula of the second quantization theory, and is 
essentially the Dirac-language analog of Eq. (4.59) of the single-particle quantum mechanics. Each term 
of the sum (87) may be described by a very simple mnemonic rule: for each pair of single-particle states 
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j and j’, annihilate a particle in the state j’, create one in the state 7, and weigh the result with the 
corresponding single-particle matrix element. One of the corollaries of Eq. (87) is that the expectation 
value of an operator whose eigenstates coincide with the Dirac states is 


(F) = (Nj [F No) = DSN), (8.88) 


with an evident physical interpretation as the sum of single-particle expectation values over all states, 
weighed by the occupancy of each state. 


Proceeding to fermions, which have to obey the Pauli principle, we immediately notice that any 
occupation number N; may only take two values, 0 or 1. To account for that, and also make the key 
relation (87) valid for fermions as well, the creation-annihilation operators are defined by the following 
relations: 


iNeed =O. Nextlyet =I? NN ag ace) 


ie Nase) = ee ‘at GN etl 


where the symbol X(/, J’) means the sum of all occupancy numbers in the states with numbers from J to 
J’, including the border points: 


(J, J’) =>N,, (8.91) 


so that the sum participating in Eqs. (89)-(90) is the total occupancy of all states with the numbers below 
j. (The states are supposed to be numbered in a fixed albeit arbitrary order.) As a result, these relations 
may be conveniently summarized in the following verbal form: if an operator replaces the j state’s 
occupancy with the opposite one (either 1 with 0, or vice versa), it also changes the sign before the 
result if (and only if) the total number of particles in the states with j’ <j is odd. 


Let us use this (perhaps somewhat counter-intuitive) sign alternation rule to spell out the ket- 
vector |11) of a completely filled two-state system, formed from the vacuum state |00) in two different 
ways. If we start by creating a fermion in the state 1, we get 


4i|0,0)=(-1)°|1,0) =|1,0),  afa']0,0) = 4! |1,0) = (-)'|L,1) =-|L1), (8.92a) 
while if the operator order is different, the result is 
}|0,0) =(—1)°|0,1)=|0,1), 4/J]0,0) = 47]0,1) = (-1)"|1,1) =|L,1), (8.92b) 
so that 
(ata! dla! 0,0) =o. (8.93) 


Since the action of any of these operator products on any initial state rather than the vacuum one also 
gives the null ket, we may write the following operator equality: 


alal +alal = {af at} = 0. (8.94) 


It is straightforward to check that this result is valid for Dirac vectors of an arbitrary length, and does 
not depend on the occupancy of other states, so that we may generalize it as 
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(8.95) 


these equalities hold for 7 = 7’ as well. On the other hand, an absolutely similar calculation shows that 
the mixed creation-annihilation commutators do depend on whether the states are different or not:2° 


(8.96) 


These equations look very much like Eqs. (75)-(76) for bosons, “only” with the replacement of 
commutators with anticommutators. Since the core laws of quantum mechanics, including the operator 
compatibility (Sec. 4.5) and the Heisenberg equation (4.199) of operator evolution in time, involve 
commutators rather than anticommutators, one might think that all the behavior of bosonic and 
fermionic multiparticle systems should be dramatically different. However, the difference is not as big 
as one could expect; indeed, a straightforward check shows that the sign factors in Eqs. (89)-(90) just 
compensate those in the Slater determinant, and thus make the key relation (87) valid for the fermions as 
well. (Indeed, this is the very goal of the introduction of these factors.) 


To illustrate this fact on the simplest example, let us examine what does the second quantization 
formalism say about the dynamics of non-interacting particles in the system whose single-particle 
properties we have discussed repeatedly, namely two nearly-similar potential wells, coupled by 
tunneling through the separating potential barrier — see, e.g., Figs. 2.21 or 7.4. If the coupling is so small 
that the states localized in the wells are only weakly perturbed, then in the basis of these states, the 
single-particle Hamiltonian of the system may be represented by the 2x2 matrix (5.3). With the energy 
reference selected at the middle between the energies of unperturbed states, the coefficient b vanishes, 
this matrix is reduced to 


é.. «€ 
h=c-6 -| - - } with c, =c, tic,, (8.97) 
c. - =e. 
and its eigenvalues to 
€, =XG; with c= | c| = (c? +c; +c i (8.98) 
Now following the recipe (87), we can use Eq. (97) to represent the Hamiltonian of the whole system of 
particles in terms of the creation-annihilation operators: 


H=clé,+c ala, +c,4)4,—c,4ta,, (8.99) 


where a} ,and d,,are the operators of creation and annihilation of a particle in the corresponding 
potential well. (Again, in the second quantization approach the particles are not numbered at all!) As 
Eq. (72) shows, the first and the last terms of the right-hand side of Eq. (99) describe the particle 
energies €1,2 = +c, in uncoupled wells, 

c,a)G,=c,N,=6,N,,  -c,4!4,=-c,N, =6,N), (8.100) 


Zz 


26 A by-product of this calculation is proof that the operator defined by Eq. (72) counts the number of particles N; 
(now equal to either 1 or 0), just at it does for bosons. 


Chapter 8 Page 21 of 52 


Essential Graduate Physics QM: Quantum Mechanics 


while the sum of the middle two terms is the second-quantization description of tunneling between the 
wells. 


Now we can use the general Eq. (4.199) of the Heisenberg picture to spell out the equations of 
motion of the creation-annihilation operators. For example, 


ind, = |a,, |= c.|4,.af4 ire. [a,.af4, |+e,|4,.afa |- c.|a.4 Ar | (8.101) 


Since the Bose and Fermi operators satisfy different commutation relations, one could expect the right- 
hand side of this equation to be different for bosons and fermions. However, it is not so. Indeed, all 
commutators on the right-hand side of Eq. (101) have the following form: 


[a,.aha,. =4,ata,-ata,a,. (8.102) 
As Eqs. (74) and (94) show, the first pair product of operators on the right-hand side may be recast as 


jal = =16,, +414 


faze 


(8.103) 
where the upper sign pertains to bosons and the lower one to fermions, while according to Eqs. (76) and 
(95), the very last pair product in Eq. (102) is 

Gi 20 4: (8.104) 


baa | Jd 


with the same sign convention. Plugging these expressions into Eq. (102), we see that regardless of the 
particle type, there is a universal (and generally very useful) commutation relation 


la, aa, | = 4,5, (8.105) 


valid for both bosons and fermions. As a result, the Heisenberg equation of motion for the operator a, , 
and the equation for a, (which may be obtained absolutely similarly), are also universal:?’ 
iha, =c,d,+c_dy, 


; (8.106) 
iha, =c,d, —C,a,. 


Zz 


This is a system of two coupled, linear differential equations, which is similar to the equations 
for the c-number probability amplitudes of single-particle wavefunctions of a two-level system — see, 
e.g., Eq. (2.201) and the model solution of Problem 4.25. Their general solution is a linear superposition 


Cn oe exp{A (8.107) 


As usual, to find the exponents 4,, it is sufficient to plug in a particular solution a, ,(¢) = @, exp{ir} 


into Eq. (106) and require that the determinant of the resulting homogeneous, linear system for the 
“coefficients” (actually, time-independent operators) @,, equals zero. This gives us the following 


characteristic equation 


+ 


27 Equations of motion for the creation operators d,|, are just the Hermitian-conjugates of Eqs. (106), and do not 


add any new information about the system’s dynamics. 
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c, —iha Gs 
eg |= OS (8.108) 
c —c,-iha 


with two roots A, = +iO/2, where Q = 2c/h — cf. Eq. (5.20). Now plugging each of the roots, one by one, 
into the system of equations for @,,, we can find these operators, and hence the general solution of 
system (98) for arbitrary initial conditions. 


Let us consider the simple case c, = c, = 0 (meaning in particular that the wells are exactly 
aligned, see Fig. 2.21), so that AOQ/2 = c = c,; then the solution of Eq. (106) is 


a,(t) = a,(0) cos ia,(0) sin, ait) = -if(0)sin + a,(0) cos (8.109) 


Multiplying the first of these relations by its Hermitian conjugate, and ensemble-averaging the result, we 
get 


(m)=(al (94,(0) = (al (0)a, (0)jeos® s+(al #(0)4,(0))sin’ a 


. (8.110) 


-i(af (0)a,(0) +a, '(0)4, (0) sin cos 


Let the initial state of the system be a single Dirac state, i.e. have a definite number of particles 
in each well; in this case, only the two first terms on the right-hand side of Eq. (110) are different from 
zero, giving:?8 


(M,) = Ny(Ooos? +N, (O)sin? (8.111) 


For one particle, initially placed in either well, this gives us our old result (2.181) describing the usual 
quantum oscillations of the particle between two wells with the frequency ©. However, Eq. (111) is 
valid for any set of initial occupancies; let us use this fact. For example, starting from two particles, with 
initially one particle in each well, we get (Ni) = 1, regardless of time. So, the occupancies do not 
oscillate, and no experiment may detect the quantum oscillations, though their frequency is still 
formally present in the time evolution equations. This fact may be interpreted as the simultaneous 
quantum oscillations of two particles between the wells, exactly in anti-phase. For bosons, we can go on 
to even larger occupancies by preparing the system, for example, in the state with N\(0) = N, N2(0) = 0. 
The result (111) says that in this case, we see that the quantum oscillation amplitude increases N-fold; 
this is a particular manifestation of the general fact that bosons can be (and evolve in time) in the same 
quantum state. On the other hand, for fermions we cannot increase the initial occupancies beyond 1, so 
that the largest oscillation amplitude we can get is if we initially fill just one well. 


The Dirac approach may be readily generalized to more complex systems. For example, Eq. (99) 
implies that an arbitrary system of potential wells with weak tunneling coupling between the adjacent 
wells may be described by the Hamiltonian 


H= Bee apy ala, the, (8.112) 


yi ee a i 


28 For the second well’s occupancy, the result is complementary, N2(t) = Ni(0)sin’°Qr + N2(0)cos’Qt , giving in 
particular a good sanity check: N,(¢) + N2(t) = N\(0) + N2(0) = const. 
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where the symbol {j, 7’} means that the second sum is restricted to pairs of next-neighbor wells — see, 
e.g., Eq. (2.203) and its discussion. Note that this Hamiltonian is still a quadratic form of the creation- 
annihilation operators, so the Heisenberg-picture equations of motion of these operators are still linear, 
and its exact solutions, though possibly cumbersome, may be studied in detail. Due to this fact, the 
Hamiltonian (112) is widely used for the study of some phenomena, for example, the very interesting 
Anderson localization effects, in which a random distribution of the localized-site energies ¢ prevents 
tunneling particles, within a certain energy range, from spreading to unlimited distances.?? 


8.4. Perturbative approaches 


The situation becomes much more difficult if we need to account for direct interactions between 
the particles. Let us assume that the interaction may be reduced to that between their pairs (as it is the 
case at their Coulomb interaction and most other interactions), so that it may be described by the 
following “pair-interaction” Hamiltonian 


A ie 
Un => Dytlin ety)» (8.113) 


k,k'=1 
k#k' 


with the front factor of 2 compensating the double-counting of each particle pair. The translation of this 
operator to the second-quantization form may be done absolutely similarly to the derivation of Eq. (87), 
and gives a similar (though naturally more involved) result 


A 1 Po, ee 
Om =~ YM yy alata, a,, (8.114) 


2 


AS AL 


where the two-particle matrix elements are defined similarly to Eq. (82): 
U iw =(B/B; Hin |B By). (8.115) 


The only new feature of Eq. (114) is a specific order of the indices of the creation operators. Note the 
mnemonic rule of writing this expression, similar to that for Eq. (87): each term corresponds to moving 
a pair of particles from states / and /’ to states 7’ and 7 (in this order!) factored with the corresponding 
two-particle matrix element (115). 


nN 
Yint 


However, with the account of such term, the resulting Heisenberg equations of the time 
evolution of the creation/annihilation operators are nonlinear, so that solving them and calculating 
observables from the results is usually impossible, at least analytically. The only case when some 
general results may be obtained is the weak interaction limit. In this case, the unperturbed Hamiltonian 
contains only single-particle terms such as (79), and we can always (at least conceptually :-) find such a 
basis of orthonormal single-particle states 4 in which that Hamiltonian is diagonal in the Dirac 
representation: 


29 For a review of the 1D version of this problem, see, e.g., J. Pendry, Adv. Phys. 43, 461 (1994). 

30 A simple but important example from the condensed matter theory is the so-called Hubbard model, in which 
particle repulsion limits their number on each of localized sites to either 0, or 1, or 2, with negligible interaction of 
the particles on different sites — though the next-neighbor sites are still connected by tunneling — as in Eq. (112). 
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440) ay Ra. (8.116) 


Now we can use Eq. (6.14), in this basis, to calculate the interaction energy as a first-order perturbation: 


EQ = CNN ci oa N,,N,,. yea (My IN get | tay at at GQ,|N,,Ny,.- .) 
oo (8.117) 
=5 DN No, |ata'a,4,|N,,Np,...) 


Since, according to Eq. (63), the Dirac states with different occupancies are orthogonal, the last long 
bracket is different from zero only for three particular subsets of its indices: 


(i) 7 #7’, /= 7, and /’ = 7’. In this case, the four-operator product in Eq. (117) is equal to 
stats 


a,a,a,a,,and applying the commutation rules twice, we can bring it to the so-called normal ordering, 


J aj, 
with each creation operator standing to the right of the corresponding annihilation operator, thus 
forming the particle number operator (72): 

a,=NN,, (8.118) 


with a similar sign of the final result for bosons and fermions. 


(ii) 7 #j’, /=/’, and /’ = 7. In this case, the four-operator product is equal to ala a,aja,, and 
bringing it to the form N (WN ; Tequires only one commutation: 
alalaa, = y= al(+ aya! a, =+414,4'a,=4N,N,,, (8.119) 
with the upper sign for bosons and the lower sign for fermions. 
(iii) All indices are equal to each other, giving alata ly =alala a ja,;- For fermions, such an 


operator (that “tries” to create or to kill two particles in a row, in the same state) immediately gives the 
null-vector. In the case of bosons, we may use Eq. (74) to commute the internal pair of operators, getting 


atala,a, = al(a, ai a, =N,(N,-D). (8.120) 


Note, however, that this expression formally covers the fermion case as well (always giving zero). As a 
result, Eq. (117) may be rewritten in the following universal form: 


1 
_ 
Ene = 7 Ny (u Siti’ 0 yy) AFL, Dy. (8.121) 
j 


737" 


The corollaries of this important result are very different for bosons and fermions. In the former 
case, the last term usually dominates, because the matrix elements (115) are typically the largest when 
all basis functions coincide. Note that this term allows a very simple interpretation: the number of the 
diagonal matrix elements it sums up for each state (7) is just the number of interacting particle pairs 
residing in that state. 
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In contrast, for fermions the last term is zero, and the interaction energy is proportional to the 
difference of the two terms inside the first parentheses. To spell them out, let us consider the case when 
there is no direct spin-orbit interaction. Then the vectors |); of the single-particle state basis may be 
represented as direct products |o ; )®|m,; ) of their orbital and spin-orientation parts. (Here, for the brevity 
of notation, I am p Sine m instead of ms.) For spin-'2 particles, eas electrons, m; may equal only 


(| (m 


m)®|m'), (8.122) 


where, as in the general Eq. (115), the position of a particular state vector in each direct product is 
encoding the particle’s number. Since the spins of different particles are defined in different Hilbert 
spaces, we may move their state vectors around to get 


ea) x ((m'|m')), =1, (8.123) 


(m|@(m ree = ((m|m’)), x ((m' n’) =Onm' (8.124) 


In this case, it is convenient to rewrite Eq. (121) in the coordinate representation, using single- 
particle wavefunctions called spin-orbitals 


v(t) =(r|B,)=((r\o) @|m)).. (8.125) 


They differ from the spatial parts of the usual orbital wavefunctions of the type (4.233) only in that their 
index j should be understood as the set of the orbital-state and the spin-orientation indices.3! Also, due to 
the Pauli-principle restriction of numbers JN; to either 0 or 1, Eq. (121) may be also rewritten without the 
explicit occupancy numbers, with the understanding that the summation is extended only over the pairs 
of occupied states. As a result, it becomes 


syle far AY: i(r a (W')uin DY (KY i (0’) eine 


—VW, V(r), r')u.ry Oy, (r')| 


In particular, for a system of two electrons, we may limit the summation to just two states (7, j’ = 
1, 2). As a result, we return to Eqs. (39)-(41), with the bottom (minus) sign in Eq. (39), corresponding to 
the triplet spin states. Hence, Eq. (126) may be considered as the generalization of the direct and 
exchange interaction balance picture to an arbitrary number of orbitals and an arbitrary total number NV 
of electrons. Note, however, that this formula cannot correctly describe the energy of the singlet spin 
states, corresponding to the plus sign in Eq. (39), and also of the entangled triplet states.32 The reason is 


3! The spin-orbitals (125) are also close to spinors (13), besides that the former definition takes into account that 
the spin s of a single particle is fixed, so that the spin-orbital may be indexed by the spin’s orientation m = m, 
only. Also, if an orbital index is used, it should be clearly distinguished from /, i.e. the set of the orbital and spin 
indices. This is why I believe that the frequently met notation of spin-orbitals as y;,(r) may lead to confusion. 

32 Indeed, due to the condition j’ # j, and Eq. (124), the calculated negative exchange interaction is limited to 
electron state pairs with the same spin direction — such as the factorable triplet states (TT and JJ) of a two- 
electron system, in which the contribution of E,,, given by Eq. (41), to the total energy is also negative. 
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that the description of entangled spin states, given in particular by Eqs. (18) and (20), requires linear 
superpositions of different Dirac states. (A proof of this fact is left for the reader’s exercise.) 


Now comes a very important fact: the approximate result (126), added to the sum of unperturbed 
energies a equals the sum of exact eigenenergies of the so-called Hartree-Fock equation:*? 


[-Ev: + uly, (r) 
2m 


(8.127) 


+>J lv (FU EY, (DY ED —V5 Et OE DY OW, (7) [a =e (0), 


J4j 


where u(r) is the external-field potential acting on each particle separately — see the second of Eqs. (79). 
An advantage of this equation in comparison with Eq. (126) is that it allows the (approximate) 
calculation of not only the energy spectrum of the system, but also the corresponding spin-orbitals, 
taking into account their electron-electron interaction. Of course, Eq. (127) is an integro-differential 
rather than just a differential equation. There are, however, efficient methods of numerical solution of 
such equations, typically based on iterative methods. One more important practical trick is the exclusion 
of the filled internal electron shells (see Sec. 3.7) from the explicit calculations, because the shell states 
are virtually unperturbed by the valence electron effects involved in typical atomic phenomena and 
chemical reactions. In this approach, the Coulomb field of the shells, described by fixed, pre-calculated, 
and tabulated pseudo-potentials, is added to that of the nuclei. This approach dramatically cuts the 
computing resources necessary for systems of relatively heavy atoms, enabling a pretty accurate 
simulation of electronic and chemical properties of rather complex molecules, with thousands of 
electrons.*+ As a result, the Hartree-Fock approximation has become the de-facto baseline of all so- 
called ab-initio (“first-principle”) calculations in the very important field of quantum chemistry.35 


In departures from this baseline, there are two opposite trends. For larger accuracy (and typically 
smaller systems), several “post-Hartree-Fock methods”, notably including the configuration interaction 
method, *° that are more complex but may provide higher accuracy, have been developed. 


There is also a strong opposite trend of extending such ab initio (““‘first-principle”) methods to 
larger systems while sacrificing some of the results’ accuracy and reliability. The ultimate limit of this 
trend is applicable when the single-particle wavefunction overlaps are small and hence the exchange 
interaction is negligible. In this limit, the last term in the square brackets in Eq. (127) may be ignored, 
and the multiplier yr) taken out of the integral, which is thus reduced to a differential equation — 
formally just the Schrédinger equation for a single particle in the following self-consistent effective 
potential: 


33 This equation was suggested in 1929 by Douglas Hartree for the direct interaction and extended to the 
exchange interaction by Vladimir Fock in 1930. To verify its compliance with Eq. (126), it is sufficient to 
multiply all terms of Eq. (127) by y*((r), integrate them over all r-space (so that the right-hand side would give 
g), and then sum these single-particle energies over all occupied states /. 

34 For condensed-matter systems, this and other computational methods are applied to single elementary 
spatial cells, with a limited number of electrons in them, using cyclic boundary conditions. 

35 See, e.g., A. Szabo and N. Ostlund, Modern Quantum Chemistry, Revised ed., Dover, 1996. 

36 That method, in particular, allows the calculation of proper linear superpositions of the Dirac states (such as the 
entangled states for N = 2, discussed above) which are missing in the generic Hartree-Fock approach — see, e.g., 
the just-cited monograph by Szabo and Ostlund. 
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Ue(K) =U) + Uy) Ma (P= IYO DMig EY (EET (8.128) 


j4j 


This is the so-called Hartree approximation — that gives reasonable results for some systems,7’ 
especially those with low electron density. 


However, in dense electron systems (such as typical atoms, molecules, and condensed matter) 
the exchange interaction, described by the second term in the square brackets of Eqs. (126)-(127), may 
be as high as ~30% of the direct interaction, and frequently cannot be ignored. The tendency of taking 
this interaction in the simplest possible form is currently dominated by the Density Functional Theory,?® 
universally known by its acronym DFT. In this approach, the equation solved for each eigenfunction 
yr) is a differential, Schrédinger-like Kohn-Sham equation 


- Fs u(r) +ulSe) tu wy, (r) =ev,(r), (8.129) 
mM 
where 
US) =-eb(), r= — far POX, ptr) =-entr), (8.130) 
ME jr-r 


and n(r) is the total electron density in a particular point, calculated as 
* 
nv) = Ly; (Oy, (0). (8.131) 
j 


The most important feature of the Kohn-Sham Hamiltonian is the simplified description of the 
exchange and correlation effects by the effective exchange-correlation potential u,,(r). This potential is 
calculated in various approximations, most of them valid only in the limit when the number of electrons 
in the system is very high. The simplest of them (proposed by Kohn et al. in the 1960s) is the Local 
Density Approximation (LDA) in which the effective exchange potential at each point is a function only 
of the electron density n at the same point r, taken from the theory of a uniform gas of free electrons.%? 
However, for many tasks of quantum chemistry, the accuracy given by the LDA is insufficient, because 
inside molecules the density n typically changes very fast. As a result, DFT has become widely accepted 
in that field only after the introduction, in the 1980s, of more accurate, though more cumbersome 
models for ux.(r), notably the so-called Generalized Gradient Approximations (GGAs). Due to its 
relative simplicity, DFT enables the calculation, with the same computing resources and reasonable 
precision, some properties of much larger systems than the methods based on the Hartree-Fock theory. 
As the result, is has become a very popular tool of ab initio calculations. This popularity is enhanced by 
the availability of several advanced DFT software packages, some of them in the public domain. 


37 An extreme example of the Hartree approximation is the Thomas-Fermi model of heavy atoms (with Z >> 1), in 
which atomic electrons, at each distance r from the nucleus, are treated as an ideal, uniform Fermi gas, with a 
certain density n(r) corresponding to the local value u,v), but a global value of their highest full single-particle 
energy, ¢ = 0, to ensure the equilibrium. (The analysis of this model is left for the reader’s exercise.) 

38 Tt had been developed by Walter Kohn and his associates (notably Pierre Hohenberg) in 1965-66, and 
eventually (in 1998) was marked with a Nobel Prize in Chemistry for W. Kohn. 

39 Just for the reader’s reference: for a uniform, degenerate Fermi-gas of electrons (with the Fermi energy & >> 
kgT), the most important, exchange part ux of ux, may be calculated analytically: ux = -(3/42) e’k,/4é, where the 
Fermi momentum k= (2m.&)'"/h is defined by the electron density: n = 2(47/3)kp/(2a) =ke/37. 
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Please note, however, that despite this undisputable success, this approach has its problems. 
From my personal point of view, the most offensive of them is the implicit assumption of the unphysical 
Coulomb interaction of an electron with itself (by dropping, on the way from Eq. (128) to Eq. (130), the 


condition j’ #j at the calculation of 1°). As a result, for a reasonable description of some effects, the 
available DFT packages are either inapplicable at all or require substantial artificial tinkering.*° 


Unfortunately, because of lack of time/space, for details I have to refer the interested reader to 
specialized literature.*! 


8.5. Quantum computation and cryptography 


Now I have to review the emerging fields of quantum computation and encryption. (Since these 
fields are much related, they are often referred to under the common title of “quantum information 
science”, though this term is somewhat misleading, de-emphasizing physical aspects of the topic.) These 
fields are currently the subject of intensive research and development efforts, which has already brought, 
besides an enormous body of hype, some results of general importance. My coverage, by necessity 
short, will focus on these results, referring the reader interested in details to special literature.42 Because 
of the very active stage of the fields, I will also provide quite a few references to recent publications, 
making the style of this section closer to a brief literature review than to a textbook’s section. 


Presently, most work on quantum computation and encryption is based on systems of spatially 
separated (and hence distinguishable) two-level systems — in this context, universally called qubits.*3 
Due to this distinguishability, the issues that were the focus of the first sections of this chapter, including 
the second quantization approach, are irrelevant here. On the other hand, systems of qubits have some 
interesting properties that have not been discussed in this course yet. 


First of all, a system of N >> 1 qubits may contain much more information than the same number 
of N classical bits. Indeed, according to the discussions in Chapter 4 and Sec. 5.1, an arbitrary pure state 
of a single qubit may be represented by its ket vector (4.37) — see also Eq. (5.1): 


|2) y= @%[u,)+@,|u>), (8.132) 


where {u;} is any orthonormal two-state basis. It is natural and common to employ, as u;, the eigenstates 
a; of the observable A that is eventually measured in the particular physical implementation of the qubit 
— say, a certain Cartesian component of spin-’2. It is also common to write the kets of these base states 
as |0) and |1), so that Eq. (132) takes the form 


40 As just a few examples, see N. Simonian et al., J. Appl. Phys. 113, 044504 (2013); M. Medvedev et al., Science 
335, 49 (2017); A. Hutama et al., J. Phys. Chem. C 121, 14888 (2017). 

41 See, e.g., either the monograph by R. Parr and W. Yang, Density-Functional Theory of Atoms and Molecules, 
Oxford U. Press, 1994, or the later textbook J. A. Steckel and D. Sholl, Density Functional Theory: Practical 
Introduction, Wiley, 2009. For a popular review and references to more recent work in this still-developing field, 
see A. Zangwill, Phys. Today 68, 34 (July 2015). 

42 Despite the recent flood of new books on the field, one of its first surveys, by M. Nielsen and I. Chuang, 
Quantum Computation and Quantum Information, Cambridge U. Press, 2000, is perhaps still the best one. 

43 In some texts, the term qubit (or “Qbit’’, or “Q-bit”) is used instead for the information contents of a two-level 
system — very much like the classical bit of information (in this context, frequently called “Cbit” or “C-bit”) 
describes the information contents of a classical bistable system — see, e.g., SM Sec. 2.2. 
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(8.133) 


(Here, and in the balance of this section, the letter 7 is a to Facets an integer equal to either 0 or 1.) 
According to this relation, any state ~ of a qubit is completely defined by two complex c-numbers q;, 
i.e. by 4 real numbers. Moreover, due to the normalization condition |a\|” + |a:|7 = 1, we need just 3 
independent real numbers — say, the Bloch sphere coordinates Oand @ (see Fig. 5.3), plus the common 
phase y, which becomes important only when we consider coherent states of a several-qubit system. 


This is a good time to note that a qubit is very much different from any classical bistable system 
used to store single bits of information — such as two possible voltage states of the usual SRAM cell 
(essentially, a positive-feedback loop of two transistor-based inverters). Namely, the stationary states of 
a classical bistable system, due to its nonlinearity, are stable with respect to small perturbations, so that 
they may be very robust to unintentional interaction with their environment. In contrast, the qubit’s state 
may be disturbed (i.e. its representation point on the Bloch sphere shifted) by even minor perturbations, 
because it does not have such an internal state stabilization mechanism.*4 Due to this reason, qubit-based 
systems are rather vulnerable to environment-induced drifts, including the dephasing and relaxation 
discussed in the previous chapter, creating major experimental challenges — see below. 


Now, if we have a system of 2 qubits, the vectors of its arbitrary pure state may be represented as 
a sum of 27=4 terms,* 


|@) v5 = 4o9|00) + 4,01) + a,9]10) +a, |11)= >) aj ighhdad (8.134) 


JpJo= > 


with four complex coefficients, i.e. eight real numbers, subject to just one normalization condition, 
which follows from the requirement (a@|@) = 1: 


x | ite 


Ji=9,1 


=1, (8.135) 


The evident generalization of Eqs. (133)-(134) to an arbitrary pure state of an N-qubit system is 
a sum of 2” terms: 
C= Bj jn jylidandn)s (8.136) 
Jodo wd =0,1 
including all possible combinations of 0s and 1s for indices , so that the state is fully described by 2” 
complex numbers, i.e. 2:2" = 2"! real numbers, with only one constraint, similar to Eq. (135), imposed 
by the normalization condition. Let me emphasize that this exponential growth of the information 
contents would not be possible without the qubit state entanglement. Indeed, in the particular case when 
qubit states are not entangled, i.e. are factorable: 


44 In this aspect as well, the information processing systems based on qubits are closer to classical analog 
computers (which were popular once, but nowadays are used for a few special applications only) rather than 
classical digital ones. 

45 Here and in most instances below I use the same shorthand notation as was used at the beginning of this chapter 
— cf. Eq. (1b). In this short form, qubit’s number is coded by the order of its state index inside a full ket-vector, 
while in the long form, such as in Eq. (137), it is coded by the order of its single-qubit vector in a full direct 
product. 
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|a),, =|@,)|a,).lay), (8.137) 


where each |q@,) is described by an equality similar to Eq. (133) with its individual expansion 
coefficients, the system state description requires only 3N — | real numbers — e.g., N sets {@ g, 7} less 
one common phase. 


However, it would be wrong to project this exponential growth of information contents directly 
on the capabilities of quantum computation, because this process has to include the output information 
readout, i.e. qubit state measurements. Due to the fundamental intrinsic uncertainty of quantum systems, 
the measurement of a single qubit even in a pure state (133) generally may give either of two results, 
with probabilities Wo = |ao|” and W, = |a\/. To comply with the general notion of computation, any 
quantum computer has to provide certain (or virtually certain) results, and hence the probabilities W; 
have to be very close to either 0 or 1, so that before the measurement, each measured qubit has to be ina 
basis state — either 0 or 1. This means that the computational system with N output qubits, just before the 
final readout, has to be in one of the factorable states 


|). =A) daw) = [Aiiodn)s (8.138) 


which is a very small subset even of the set of all unentangled states (137), and whose maximum 
information contents is just N classical bits. 


Now the reader may start thinking that this constraint strips quantum computations of any 
advantages over their classical counterparts, but such a view is also superficial. To show that, let us 
consider the scheme of the most actively explored type of quantum computation, shown in Fig. 3.4 


li) lA) out 


; (i, )ia-—> (j 1 Ne 
classical classical 
bits bits 
ofthe < (j:),—— (Js )ou S- of the 
input output 
number number 


(iy )- (ix hs 
qubit state \ax) unitary \cx) qubit state 
preparation in transform out measurement 


Fig. 8.3. The baseline scheme of quantum computation. 


Here each horizontal line (sometimes called a “wire’’4’) corresponds to a single qubit, tracing its 
time evolution in the same direction as at the usual time function plots: from left to right. This means 


46 Numerous modifications of this “baseline” scheme have been suggested, for example with the number of output 
qubits different from that of input qubits, etc. Some other options are discussed at the end of this section. 

47 The notion of “wires” stems from the similarity between such quantum schemes and the drawings describing 
classical computation circuits — see, e.g., Fig. 4a below. In the classical case, the lines may be indeed understood 
as physical wires connecting physical devices: logic gates and/or memory cells. In this context, note that classical 
computer components also have non-zero time delays, so that even in this case the left-to-right device ordering is 
useful to indicate the timing of (and frequently the causal relation between) the signals. 
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that the left column |@)in of ket-vectors describes the initial state of the qubits,48 while the right column 
|@)out describes their final (but pre-measurement) state. The box labeled U represents the qubit evolution 
in time due to their specially arranged interactions between each other and/or external drive “forces”. 
Besides these forces, during this evolution the system is supposed to be ideally isolated from the 
dephasing and energy-dissipating environment, so that the process may be described by a unitary 
operator defined in the 2-dimensional Hilbert space of N qubits: 


ja). =Ul|@)... (8.139) 


in 


With the condition that the input and output states have the simple form (138), this equality reads 


[Ciba Fs ac TR aa =O Oa ta aidan dut (8.140) 


The art of quantum computer design consists of selecting such unitary operators U that would: 


- satisfy Eq. (140), 
- be physically implementable, and 


- enable substantial performance advantages of the quantum computation over its classical 
counterparts with similar functionality, at least for some digital functions (algorithms). 


I will have time/space to demonstrate the possibility of such advantages on just one, perhaps the 
simplest example — the so-called Deutsch problem,* on the way discussing several common notions and 
issues of this field. Let us consider the family of single-bit classical Boolean functions jout = fVin). Since 
both j are Boolean variables, i.e. may take only values 0 and 1, there are evidently only 4 such functions 
— see the first four columns of the following table: 


f | A0) | AQ) | class | F | f1)-f{0) 
fi 0 0 | constant | 0 0 
iD 0 1 | balanced | 1 1 (8.141) 
i 1 0 | balanced | 1 a | 
fa 1 1 constant | 0 0 


Of them, the functions f, and /4, whose values are independent of their arguments, are called constants, 
while the functions f; (called “YES” or “IDENTITY”) and f; (‘NOT” or “INVERSION” are called 
balanced. The Deutsch problem is to determine the class of a single-bit function, implemented in a 
“black box”, as being either constant or balanced, using just one experiment. 


48 As was discussed in Chapter 7, the preparation of a pure state (133) is (conceptually :-) straightforward. Placing 
a qubit into a weak contact with an environment of temperature T << A/kg, where A is the difference between 
energies of the eigenstates 0 and 1, we may achieve its relaxation into the lowest-energy state. Then, if the qubit 
must be set into a different pure state, it may be driven there by the application of a pulse of a proper external 
classical “force”. For example, if an actual spin-’/ is used as the qubit, a pulse of a magnetic field, with proper 
direction and duration, may be applied to arrange its precession to the required Bloch sphere point — see Fig. 5.3c. 
However, in most physical implementations of qubits, a more practicable way for that step is to use a proper part 
of the Rabi oscillation period — see Sec. 6.5. 

49 Tt is named after David Elieser Deutsch, whose 1985 paper (motivated by an inspirational but not very specific 
publication by Richard Feynman in 1982) launched the whole field of quantum computation. 
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Classically, this is clearly impossible, and the simplest way to perform the function’s 
classification involves two similar black boxes f— see Fig. 4a.°° It also uses the so-called exclusive-OR 
(XOR for short) gate whose output is described by the following function F of its two Boolean 
arguments 7; and j2:>! 

ara ee . _|9, if j, = Jo, 
FAI.) =i, Ody = ts ; (8.142) 
1, if fj, #J,- 
In the particular circuit shown in Fig. 4a, the gate produces the following output: 


F = f(0)® f()), (8.143) 


which is equal to 1 if f(0) # f(1), 1c. if the function fis balanced, and to 0 in the opposite case — see 
column F in the table of Eq. (141). 


(a) 


Fig. 8.4. The simplest (a) classical and (b) quantum ways to classify a single-bit Boolean function f- 


On the other hand, as will be shown below, any of four functions f may be implemented 
quantum-mechanically, for example (Fig. 5a) as a unitary transform of two input qubits, acting as 
follows on each basis component |j172) = |71)|/2) of the general input state (134): 


flair) =|A) i. @FGD), (8.144) 


where f is the corresponding classical Boolean function — see the table in Eq. (141). 


(a) (b) 


|i) a) |i) a) 


Fig. 8.5. Two-qubit quantum gates: (a) a 


C two-qubit function f and (b) its particular 
. : ; F . . case C (CNOT), and their actions on a 
|i) |i, ® fi) | i>) |i, ® J.) ( ) 


basis state. 


In the particular case when f in Eq. (144) is just the YES function: f(/) = (7) =/, this “circuit” is 
reduced to the so-called CNOT gate, a key ingredient of many other quantum computation schemes, 
performing the following two-qubit transform: 


50 Alternatively, we may perform two sequential experiments on the same black box f, first recording, and then 
recalling the first experiment’s result. However, the Deutsch problem calls for a single-shot experiment. 

5! The XOR sign @ should not be confused with the sign © of the direct product of state vectors (which 
in this section is just implied). 
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Cg =i Ges. (8.145a) 


Let us use Eq. (142) to spell out this function for all four possible input qubit combinations: 


A 


C\00)=|00), C\01)=|01), Cj10)=|11), Cl11)=|10). (8.145b) 


In plain English, this means that acting on a basis state j,j2, the CNOT gate leaves the state of the first, 
source qubit (shown by the upper horizontal line in Fig. 5) intact, but flips the state of the second, target 
qubit if the first one is in the basis state 1. In even simpler words, the state 7; of the source qubit controls 
the NOT function acting on the target qubit; hence the gate’s name CNOT — the semi-acronym of 
“Controlled NOT”. 


For the quantum function (144), with an arbitrary and unknown f, the Deutsch problem may be 
solved within the general scheme shown in Fig. 3, with the particular structure of the unitary-transform 
box U spelled out in Fig. 4b, which involves just one implementation of the function f, Here the single- 
qubit quantum gate # performs the Hadamard (or “Walsh-Hadamard“ or “Walsh”) transform,>2 whose 


operator is defined by the following actions on the qubit’s basis states: 
, 1 - 1 
#|0) = —=(|0) +|1}), #1) = (0) -|1)}), 8.146 
]o}= (0) +{0) #4) = (o)-Iy) (8.146) 


- see also the two leftmost state label columns in Fig. 4b.°? Since this operator has to be linear (to be 
quantum-mechanically realistic), it needs to perform the action (146) on the basis states even when they 
are parts of a linear superposition — as they are, for example, for the two right Hadamard gates in Fig. 
Ab. For example, as immediately follows from Eqs. (146) and the operator’s linearity, 


1 


#(0))~#{ 0) = 40} +A))= (0) +1))+ 5 (0)-|0)]=I0). 1479 


Absolutely similarly, we may get*4 


#(#\1))=|1). (8.147b) 


Now let us carry out a sequential analysis of the “circuit” shown in Fig. 4b. Since the input states 
of the gate f in this particular circuit are described by Eqs. (146), its output state’s ket is 


1 


Ale o\FAN))= F{ 0) +1) (0)-))}~ $(A)00)- Fon to) Fy). @.148 


Now we may apply Eq. (144) to each component in the parentheses: 


52 Named after mathematicians J. Hadamard (1865- 1963) and J. Walsh (1895-1973). To avoid any chance of 


confusion between the Hadamard transform’s operator # and the general Hamiltonian operator H, in these 
notes they are typeset using different fonts. 
53 Note that according to Eq. (146), the operator # does not belong to the class of transforms U described by Eq. 
(140) — while the whole “circuit” shown in Fig. 4b, does — see below. 
54 Since the states 0 and | form a full basis of a single qubit, both Eqs. (147) may be summarized as an operator 
equality: #? =]. It is also easy to verify that the Hadamard transform of an arbitrary state may be represented 
on the Bloch sphere (Fig. 5.3) as a rotation about the direction that bisects the angle between the x- and z-axes. 
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f|00) — f|01) + f|10) — f|11) = f]0)]0) — f]0)|1) + f|1)]0) — f|2)]1) 
=|0)}0 ® f(0))—|O)]1@ fO)) + |LOG fA))-|Hj1@ fA) (8.149) 
=|0)|0 £())-|1® £(0)))+|1)(0® F)) -|1® f@)) 


Note that the contents of the first parentheses of the last expression, characterizing the state of the target 
qubit, is equal to (|0) — |1)) = (-1)° (0) — |1)) if NO) = 0 (and hence 0®(0) = 0 and 1@f(0) = 1), and to (|1) 
— |0)) = (-1)'({0) — |1)) in the opposite case A) = 1, so that both cases may be described in one shot by 
rewriting the parentheses as (-1)(\0) —|1)). The second parentheses is absolutely similarly controlled 
by the value of f(1), so that the outputs of the gate f are unentangled: 


(Ay Vcd 1 1 1 

S\H| 0A 1) = =D 10) + 2) | 1) 0) —]1)) = +—(0) + Cp)" 

(Hoven) = (0/10) +-D/I0)-H))= 4 (0)+ "I 

where the last step has used the fact that the classical Boolean function F, defined by Eq. (142), is equal 

to +[f(1) — f(0)] — please compare the last two columns in Eq. (141). The front sign + in Eq. (150) may 

be prescribed to any of the component ket-vectors — for example to that of the target qubit, as shown by 
the third column of state labels in Fig. 4b. 


((0)-|1)), (8.150) 


This intermediate result is already rather remarkable. Indeed, it shows that, despite the 
superficial impression one could get from Fig. 5, the gates f and C, being “controlled” by the source 
qubit, may change that qubit’s state as well! This fact (partly reflected by the vertical direction of the 
control lines in Figs. 4 and 5, symbolizing the same stage of the system’s time evolution) shows how 
careful one should be interpreting quantum-computational “circuits”, thriving on qubits’ entanglement, 
because the “signals” on different sections of a “wire” may differ — see Fig. 4b again. 


At the last stage of the circuit shown in Fig. 4b, the qubit components of the state (150) are fed 
into one more pair of Hadamard gates, whose outputs therefore are 


Ee 


# Fo 0)+"1))= lo) nea) ana {= (0)-M))]= +) Ao). @.151 


Now using Eqs. (146) again, we see that the output state ket-vectors of the source and target qubits are, 
respectively, 
1+(-1)* 
2 


jo OY and +|I). (8.152) 


Since, according to Eq. (142), the Boolean function F may take only values 0 or 1, the final state of the 
source qubit is always one of its basis states 7, namely the one with j = F. Its measurement tells us 
whether the function f, participating in Eq. (144), is constant or balanced — see Eq. (141) again.°5 


Thus, the quantum circuit shown in Fig. 4b indeed solves the Deutsch problem in one shot. 
Reviewing our analysis, we may see that this is possible because the unitary transform performed by the 
quantum gate f is applied to the entangled states (146) rather than to the basis states. Due to this trick, 
the quantum state components depending on f(0) and f(1) are processed simultaneously, in parallel. This 


55 Note that the last Hadamard transform of the target qubit (i.e. the Hadamard gate shown in the lower right 
corner of Fig. 4b) is not necessary for the Deutsch problem’s solution — though it should be included if we want 
the whole circuit to satisfy the condition (140). 
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quantum parallelism may be extended to circuits with many (NV >> 1) qubits and, for some tasks, 
provide a dramatic performance increase — for example, reducing the necessary circuit component 
number from O(2") to O(N”), where p is a finite (and not very big) number. 


However, this efficiency comes at a high price. Indeed, let us discuss the possible physical 
implementation of quantum gates, starting from the single-qubit case, on an example of the Hadamard 
gate (146). With the linearity requirement, its action on the arbitrary state (133) should be 


Hac) = a,?|0) + a,A|1) = ay (0) +[1))+4 50) jt))= sl +4,)0) += —a,)1), (8.153) 


meaning that the state probability amplitudes in the end (t= 7) and in the beginning (¢ = 0) of the qubit 
evolution in time have to be related as 


a) (0) + a,(0) a, (0) — a, (0) 
v2 v2 
This task may be again performed using the Rabi oscillations, which were discussed in Sec. 6.5, 
i.e. by applying to the qubit (a two-level system), for a limited time period 7, a weak sinusoidal external 


ay (7) = a(7)= (8.154) 


signal of frequency @ equal to the intrinsic quantum oscillation frequency @,, defined by Eq. (6.85). 
The analysis of the Rabi oscillations was carried out in Sec. 6.5, even for non-vanishing (though small) 
detuning A = @— @ mn, but only for the particular initial conditions when at ¢ = 0 the system was fully in 
one on the basis states (there labeled as n’), i.e. the counterpart state (there labeled 1) was empty. For 
our current purposes we need to find the amplitudes ao,,(¢) for arbitrary initial conditions ao,\(0), subject 
only to the time-independent normalization condition |ao|* + |ai|” = 1. For the case of exact tuning, A = 
0, the solution of the system (6.94) is elementary,°° and gives the following solution:57 

a, (t) = a, (0) cos Qt — ia, (O)e'” sin 1, (8.155) 

a, (t) = a,(0) cosQt — ia, (0)e ’? sin Or, 


where Q is the Rabi oscillation frequency (6.99), in the exact-tuning case proportional to the amplitude 
|A| of the external ac drive A = |Alexp{ig} — see Eq. (6.86). Comparing these expressions with Eqs. 
(154), we see that for t= 7= 7/4Q and g = 7/2 they “almost” coincide, besides the opposite sign of 
a\(7). Conceptually the simplest way to correct this deficiency is to follow the ac “z/4-pulse’, just 
discussed, by a short dc “z-pulse” of the duration 7 = 7/6, which temporarily creates a small additional 
energy difference 6 between the basis states 0 and 1. According to the basic Eq. (1.62), such difference 
creates an additional phase difference 7 d/h between the states, equal to z for the “z-pulse”’. 


Another way (that may be also useful for two-qubit operations) is to use another, auxiliary 
energy level E, whose distances from the basic levels E; and Eo are significantly different from the 
difference (E; — Eo) — see Fig. 6a. In this case, the weak external ac field tuned to any of the three 
potential quantum transition frequencies @,° = (E,- E,’)/h initiates such transitions between the 
corresponding states only, with a negligible perturbation of the third state. (Such transitions may be 


56 An alternative way to analyze the qubit evolution is to use the Bloch equation (5.21), with an appropriate 
function Q(4) describing the control field. 
57 To comply with our current notation, the coefficients a, and a, of Sec. 6.5 are replaced with ap and a. 
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again described by Eqs. (155), with the appropriate index changes.) For the Hadamard transform 
implementation, it is sufficient to apply (after the already discussed 7/4-pulse of frequency @jo, and with 
the initially empty level £2), an additional zpulse of frequency @0, with any phase g. Indeed, according 
to the first of Eqs. (155), with the due replacement a(0) — a2(0) = 0, such pulse flips the sign of the 
amplitude ao(t), while the amplitude a,(f), not involved in this additional transition, remains unchanged. 


(a) (b) (c) 
E, 12) 11) 1) 
ho, | ha,, A 10) 
és r 01),|10) 01 
ha A 
E, (0) 00) 00) 


Fig. 8.6. Energy-level schemes used for unitary transformations of (a) single qubits and (b, c) two-qubit systems. 


Now let me describe the conceptually simplest (though, for some qubit types, not the most 
practically convenient) scheme for the implementation of two-qubit gates, on an example of the CNOT 
gate whose operation is described by Eq. (145). For that, evidently, the involved qubits have to interact 
for some time 7. As was repeatedly discussed in the two last chapters, in most cases such interaction of 
two subsystems is factorable — see Eq. (6.145). For qubits, 1.e. two-level systems, each of the component 
operators may be represented by a 2x2 matrix in the basis of states 0 and 1. According to Eq. (4.106), 
such matrix may be always expressed as a linear combination (bI + ¢-o), where b and three Cartesian 
components of the vector ¢ are c-numbers. Let us consider the simplest form of such factorable 
interaction Hamiltonian: 

2 (1) 2(2) 
ial )= | os’, for 0<t<7, (8.156) 


0, otherwise, 


where the upper index is the qubit number and « is a c-number constant.5§ According to Eq. (4.175), by 
the end of the interaction period, this Hamiltonian produces the following unitary transform: 


= exp|- Hl 7 = exp|- xsteeir| (8.157) 


Since in the basis of unperturbed two-bit basis states |j 72), the product operator 6NG°) is diagonal, so is 


the unitary operator (157), with the following action on these states: 


58 The assumption of simultaneous time independence of the basis state vectors and the interaction operator 
(within the time interval 0 < t< 7) is possible only if the basis state energy difference A of both qubits is exactly 
the same. In this case, the simple physical explanation of the time evolution (156) follows from Figs. 6b,c, which 
show the spectrum of the total energy E = E, + E> of the two-bit system. In the absence of interaction (Fig. 6b), 
the energies of two basis states, |01) and |10), are equal, enabling even a weak qubit interaction to cause their 
substantial evolution in time — see Sec. 6.7. If the qubit energies are different (Fig. 6c), the interaction may still be 
reduced, in the rotating-wave approximation, to Eq. (156), by compensating the energy difference (A; — Ao) with 
an external ac signal of frequency w= (A; — A2)/h — see Sec. 6.5. 
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Nn 


On| Ado) = expli0oo}| j,7,), (8.158) 


where 0 = —x7/h, and o; are the eigenvalues of the Pauli matrix o, for the basis states of the 


int 


corresponding qubit: o, = +1 for |7) = |0), and o, =—1 for |7) = |1). Let me, for clarity, spell out Eq. (158) 
for the particular case 9 =—z/4 (corresponding to the qubit coupling time 7= 2i/4x): 


n 


U.\00)=e*/*)00), U,,(01)=e'*'|01), U,,|10)=e'7/*|10), O,,,[11) =e "7/4 |11) . (8.159) 


int int int int 


In order to compensate the undesirable parts of this joint phase shift of the basis states, let us 
now apply similar individual “rotations” of each qubit by angle 0’ = +7/4, using the following product 
of two independent operators, plus (just for the result’s clarity) a common, and hence inconsequential, 
phase shift 0” =—7/4:59 


U om = exp{i0"(6 + 62))4 io"\= expfi 4. explo. | eval (8.160) 


Since this operator is also diagonal in the |j1/2) basis, it is easy to calculate the change of the basis states 


by the total unitary operator U,,, =U .,,U mm : 


tot 


n 


00)=|00), U0 


n 


coe tot 01) = |01), Oi 10) = |10), Oe 


)=11). (8.161) 


This result already shows the main “miracle action” of two-qubit gates, such as the one shown in Fig. 
Ab: the source qubit is left intact (only if it is in one of the basis states!), while the state of the target 
qubit is altered. True, this change (of the sign) is still different from the CNOT operator’s action (145), 
but may be readily used for its implementation by sandwiching of the transform U;,; between two 
Hadamard transforms of the target qubit alone: 


C= 580 He), (8.162) 


tot 
So, we have spent quite a bit of time on the discussion of the CNOT gate,® and now I can 
reward the reader for their effort with a bit of good news: it has been proved that an arbitrary unitary 
transform that satisfies Eq. (140), i.e. may be used within the general scheme outlined in Fig. 3, may be 
decomposed into a set of CNOT gates, possibly augmented with simpler single-qubit gates — for 
example, the Hadamard gate plus the 7/2 rotation discussed above.®! Unfortunately, I have no time for a 


59 As Eq. (4.175) shows, each of the component unitary transforms exp {i6'o,} may be created by applying to 


each qubit, for time interval 7 = h@’/x’, a constant external field described by Hamiltonian H = —K'G,. We 
already know that for a charged, spin-’2 particle, such Hamiltonian may be created by applying a z-oriented 
external dc magnetic field — see Eq. (4.163). For most other physical implementations of qubits, the organization 
of such a Hamiltonian is also straightforward — see, e.g., Fig. 7.4 and its discussion. 

60 As was discussed above, this gate is identical to the two-qubit gate shown in Fig. 5a for f= f;, i.e. fj) =/. The 
implementation of the gate of f for 3 other possible functions f requires straightforward modifications, whose 
analysis is left for the reader’s exercise. 

6! This fundamental importance of the CNOT gate was perhaps a major reason why David Wineland, the leader of 
the NIST group that had demonstrated its first experimental implementation in 1995 (following the theoretical 
suggestion by J. Cirac and P. Zoller), was awarded the 2012 Nobel Prize in Physics — shared with Serge Haroche, 
the leader of another group working towards quantum computation. 
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detailed discussion of more complex circuits.°%* The most famous of them is the scheme for integer 
number factoring, suggested in 1994 by Peter Winston Shor.®? Due to its potential practical importance 
for breaking broadly used communication encryption schemes such as the RSA code, this opportunity 
has incited much enthusiasm and triggered experimental efforts to implement quantum gates and circuits 
using a broad variety of two-level quantum systems. By now, the following experimental options have 
given the most significant results: 


(i) Trapped ions. The first experimental demonstrations of quantum state manipulation 
(including the already mentioned first CNOT gate) have been carried out using deeply cooled atoms in 
optical traps, similar to those used in frequency and time standards. Their total spins are natural qubits, 
whose states may be manipulated using the Rabi transfers excited by suitably tuned lasers. The spin 
interactions with the environment may be very weak, resulting in large dephasing times 7> — up to a few 
seconds. Since the distances between ions in the traps are relatively large (of the order of a micron), 
their direct spin-spin interaction is even weaker, but the ions may be made effectively interacting either 
via their mechanical oscillations about the potential minima of the trapping field, or via photons in 
external electromagnetic resonators (“cavities”’).°° Perhaps the main challenge of using this approach for 
quantum computation is poor “scalability”, i.e. the enormous experimental difficulty of creating and 
managing large ordered systems of individually addressable qubits. So far, only a-few-qubit systems 
have been demonstrated.°’ 


(ii) Nuclear spins are also typically very weakly connected to their environment, with dephasing 
times 7, exceeding 10 seconds in some cases. Their eigenenergies Ey and EF; may be split by external dc 
magnetic fields (typically, of the order of 10 T), while the interstate Rabi transfers may be readily 
achieved by using the nuclear magnetic resonance, i.e. the application of external ac fields with 
frequencies w= (E| — Eo)/h — typically, of a few hundred MHz. The challenges of this option include the 
weakness of spin-spin interactions (typically mediated through molecular electrons), resulting in a very 
slow spin evolution, whose time scale i/x may become comparable with 7>, and also very small level 
separations E — Eo, corresponding to a few K, i.e. much smaller than the room temperature, creating a 
challenge of qubit state preparation.°* Despite these challenges, the nuclear spin option was used for the 
first implementation of the Shor algorithm for factoring of a small number (15 = 5x3) as early as 2001. 
However, the extension of this success to larger systems, beyond the set of spins inside one molecule, is 
extremely challenging. 


62 For that, the reader may be referred to either the monographs by Nielsen-Chuang and Reiffel-Polak, cited 
above, or to a shorter (but much more formal) textbook by N. Mermin, Quantum Computer Science, Cambridge 
U. Press, 2007. 

63 A clear description of this algorithm may be found in several accessible sources, including Wikipedia — see the 
article Shor’s Algorithm. 

64 Named after R. Rivest, A. Shamir, and L. Adleman, the authors of the first open publication of the code in 
1977, but actually invented earlier (in 1973) by C. Cocks. 

65 For a discussion of other possible implementations (such as quantum dots and dopants in crystals) see, e.g., T. 
Ladd et al., Nature 464, 45 (2010), and references therein. 

66 A brief discussion of such interactions (so-called Cavity QED) will be given in Sec. 9.4 below. 

67 See, e.g., S. Debnath et al., Nature 536, 63 (2016). Note also the related work on arrays of trapped, optically- 
coupled neutral atoms — see, e.g., J. Perczel et al., Phys. Rev. Lett. 119, 023603 (2017) and references therein. 

68 This challenge may be partly mitigated using ingenious spin manipulation techniques such as refocusing — see, 
e.g., either Sec. 7.7 in Nielsen and Chuang, or the J. Keeler’s monograph cited at the end of Sec. 6.5. 

69 B. Lanyon et al., Phys. Rev. Lett. 99, 250505 (2001). 
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(iii) Josephson-junction devices. Much better scalability may be achieved with solid-state 
devices, especially using superconductor integrated circuits including weak contacts — Josephson 
junctions (see their brief discussion in Sec. 1.6). The qubits of this type are based on the fact that the 
energy U of such a junction is a highly nonlinear function of the Josephson phase difference g— see Sec. 
1.6. Indeed, combining Eqs. (1.73) and (1.74), we can readily calculate U(@) as the work W of an 


external circuit increasing the phase from, say, zero to some value @: 


P=9 = P=P = 
U(v)-U(0)= [aw = [Wat = mc | sing’ Zo dt = mae (1—cosq). (8.163) 
0 oh Re ae dt h 


There are several options of using this nonlinearity for creating qubits;”° currently the leading 
option, called the phase qubit, is using two lowest eigenstates localized in one of the potential wells of 
the periodic potential (163). A major problem of such qubits is that at the very bottom of this well the 
potential U(@g) is almost quadratic, so that the energy levels are nearly equidistant — cf. Eqs. (2.262), 
(6.16), and (6.23). This is even more true for the so-called “transmons” (and ““Xmons”, and “Gatemons”’, 
and several other very similar devices’!) — the currently used phase qubits versions, where a Josephson 
junction is made a part of an external electromagnetic oscillator, making its relative total nonlineartity 
(anharmonism) even smaller. As a result, the external rf drive of frequency w = (E; — Eo)/h, used to 
arrange the state transforms described by Eq. (155), may induce simultaneous undesirable transitions to 
(and between) higher energy levels. This effect may be mitigated by a reduction of the ac drive 
amplitude, but at a price of the proportional increase of the operation time and hence of dephasing — see 
below. (I am leaving a quantitative estimate of such an increase for the reader’s exercise.) 


Since the coupling of Josephson-junction qubits may be most readily controlled (and, very 
importantly, kept stable if so desired), they have been used to demonstrate the largest prototype quantum 
computing systems to date, despite quite modest dephasing times 7 — for purely integrated circuits, in 
the tens of microseconds at best, even at operating temperatures in tens of mK. By the time of this 
writing (mid-2019), several groups have announced chips with a few dozen of such qubits, but to the 
best of my knowledge, only their smaller subsets could be used for high-fidelity quantum operations.” 


(iv) Optical systems, attractive because of their inherently enormous bandwidth, pose a special 
challenge for quantum computation: due to the virtual linearity of most electromagnetic media at 
reasonable light power, the implementation of qubits (i.e. two-level systems), and interaction 
Hamiltonians such as the one given by Eq. (156), is problematic. In 2001, a very smart way around this 


70 The “most quantum” option in this technology is to use Josephson junctions very weakly coupled to their 
dissipative environment (so that the effective resistance shunting the junction is much higher than the quantum 


resistance unit Rg = (7/2) hie? ~ 104 Q). In this case, the Josephson phase variable gy behaves as a coordinate of a 
1D quantum particle, moving in the 2z-periodic potential (163), forming the energy band structure E(q) similar to 
those discussed in Sec. 2.7. Both theory and experiment show that in this case, the quantum states in adjacent 
Brillouin zones differ by the charge of one Cooper pair 2e. (This is exactly the effect responsible for the Bloch 
oscillations of frequency (2.252).) These two states may be used as the basis states of charge qubits. 
Unfortunately, such qubits are rather sensitive to charged impurities, randomly located in the junction’s vicinity, 
causing uncontrollable changes of its parameters, so that currently, to the best of my knowledge, this option is not 
actively pursued. 

71 For a recent review of these devices see, e.g., G. Wendin, Repts. Progr. Phys. 80, 106001 (2017), and 
references therein. 

72 See, e.g., C. Song et al., Phys. Rev. Lett. 119, 180511 (2017) and references therein. 
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hurdle was invented.7? In this KLM scheme (also called the “linear optical quantum computing”), 
nonlinear elements are not needed at all, and quantum gates may be composed just of linear devices 
(such as optical waveguides, mirrors, and beam splitters), plus single-photon sources and detectors. 
However, estimates show that this approach requires a much larger number of physical components than 
those using nonlinear quantum systems such as usual qubits,” so that right now it is not very popular. 


So, despite more than two decades of large-scale efforts, the progress of quantum computing 
development has been rather modest. The main culprit here is the unintentional coupling of qubits to 
their environment, leading most importantly to their state dephasing, and eventually to errors. Let me 
discuss this major issue in detail. 


Of course, some error probability exists in classical digital logic gates and memory cells as 
well.’> However, in this case, there is no conceptual problem with the device state measurement, so that 
the error may be detected and corrected in many ways. Conceptually,’ the simplest of them is the so- 
called majority voting logic — using several similar logic circuits working in parallel and fed with 
identical input data. Evidently, two such devices can detect a single error in one of them, while three 
devices in parallel may correct such error, by taking two coinciding output signals for the genuine one. 


For quantum computation, the general idea of using several devices (say, qubits) for coding the 
same information remains valid; however, there are two major complications. First, as we know from 
Chapter 7, the environment’s dephasing effect may be described as a slow random drift of the 
probability amplitudes a;, leading to the deviation of the output state ain from the required form (140), 
and hence to a non-vanishing probability of wrong qubit state readout — see Fig. 3. Hence the quantum 
error correction has to protect the result not against possible random state flips 0 < 1, as in classical 
digital computers, but against these “creeping” analog errors. 


Second, the qubit state is impossible to copy exactly (clone) without disturbing it, as follows 
from the following simple calculation.”? Cloning some state @ of one qubit to another qubit that is 
initially in an independent state (say, the basis state 0), without any change of a, means the following 
transformation of the two-qubit ket: |@0) — |aa). If we want such transform to be performed by a real 
quantum system, whose evolution is described by a unitary operator 1, and to be correct for an arbitrary 
state a, it has to work not only for both basis states of the qubit: 


An 


u|00)=|00), a 


10) =|11), (8.164) 


but also for their arbitrary linear combination (133). Since the operator 7 has to be linear, we may use 
that relation, and then Eq. (164) to write 


73 E. Knill et al., Nature 409, 46 (2001). 

74 See, e.g., Y. Li et al., Phys. Rev. X 5, 041007 (2015). 

7> In modern integrated circuits, such “soft” (runtime) errors are created mostly by the high-energy neutron 
component of cosmic rays, and also by the @-particles emitted by radioactive impurities in silicon chips and their 
packaging. 

76 Practically, the majority voting logic increases circuit complexity and power consumption, so that it is used 
only in most critical points. Since in modern digital integrated circuits the bit error rate is very small (< 10°), in 
most of them, less radical but also less penalizing schemes are used — if used at all. 

77 Amazingly, this simple no-cloning theorem was discovered as late as 1982 (to the best of my knowledge, 
independently by W. Wooters and W. Zurek, and by D. Dieks), in the context of work toward quantum 
cryptography — see below. 


Chapter 8 Page 41 of 52 


Essential Graduate Physics QM: Quantum Mechanics 


ii| a0) = a(a,|0) + a,|1))0) = a,ti|00) + a,ti|10) = a,|00) +a, |11). (8.165) 
On the other hand, the desired result of the state cloning is 
|aar) = (a,|0) +.a,|1)a,|0) + a,|1))= a2 |00) + aga, 10) +]01))+a?|11), (8.166) 
i.e. is evidently different, so that, for an arbitrary state @, and an arbitrary unitary operator #7, 
ui|a0) #|aa), (8.167) 


meaning that the qubit state cloning is indeed impossible.’* This problem may be, however, indirectly 
circumvented — for example, in the way shown in Fig. 7a. 


Fig. 8.7. (a) Quasi-cloning, and (b) detection and correction of dephasing errors in a single qubit. 


Here the CNOT gate, whose action is described by Eq. (145), entangles an arbitrary input state 
(133) of the source qubit with a basis initial state of an ancillary target qubit — frequently called the 
ancilla. Using Eq. (145), we can readily calculate the output two-qubit state’s vector: 


Ja), =C(a,|0) + a,|1)}0) = a,C|00) + a,C|10) = a,]00) + a,|11). (8.168) 


N= 


We see that this circuit does perform the operation (165), i.e. gives the initial source qubit’s probability 
amplitudes ap and a; equally to two qubits, i.e. duplicates the input information. However, in contrast 
with the “genuine” cloning, it changes the state of the source qubit as well, making it entangled with the 
target (ancilla) qubit. Such “quasi-cloning” is the key element of most suggested quantum error 
correction techniques. 


Consider, for example, the three-qubit “circuit” shown in Fig. 7b, which uses two ancilla qubits 
— see the two lower “wires”. At its first two stages, the double application of the quasi-cloning produces 
an intermediate state A with the following ket-vector: 


| 4) = a,|000) + a,|111), (8.169) 


which is an evident generalization of Eq. (168).7? Next, subjecting the source qubit to the Hadamard 
transform (146), we get the three-qubit state B represented by the state vector 


78 Note that this does not mean that two (or several) qubits cannot be put into the same, arbitrary quantum state — 
theoretically, with arbitrary precision. Indeed, they may be first set into their lowest-energy stationary states, and 
then driven into the same arbitrary state (133) by exerting on them similar classical external fields. So, the no- 
cloning theorem pertains only to qubits in unknown states a — but this is exactly what we need for error correction 
— see below. 
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|B) = a, ~(0)+|1))00) + 55 l0)-f)). (8.170) 


Now let us assume that at this stage, the source qubit comes into contact with a dephasing 
environment — in Fig. 7b, symbolized by the single-qubit “gate” g. As we know from Chapter 7 (see 
Eq. (7.22) and its discussion, and also Sec. 7.3), its effect may be described by a random shift of the 
relative phase of two states:8° 


}0) > e%|0), I) ae" I1). (8.171) 
As a result, for the intermediate state C (see Fig. 7b) we may write 
1 : _: 1 : _: 
IC)= 4, (ea) +e 71) ]00)+a, = (e'#|0)-e ily) 4). (8.172) 


At this stage, in this simple theoretical model, the coupling with the environment is completely 
stopped (ahh, if this could be possible! we might have quantum computers by now :-), and the source 
qubit is fed into one more Hadamard gate. Using Eqs. (146) again, for the state D after this gate we get 


|D) =a (cos g| 0) +isin g\1))| 00) +a, (i sin g| 0) + cos |1))| 1 1) ‘ (8.173) 


Now the qubits are passed through the second, similar pair of CNOT gates — see Fig. 7b. Using Eq. 
(145), for the resulting state EF we readily get the following expression: 


|E) =a, cos p|000) + a,isin y|111) +a,ising|011)+a, cos g|100) , (8.174a) 
whose right-hand side may by evidently grouped as 
|E) = (a,|0) + a,|1))cos | 00) + (a,|0) + a4|1)) sin g|11). (8.174b) 


This is already a rather remarkable result. It shows that if we measured the ancilla qubits at stage 
E, and both results corresponded to states 0, we might be 100% sure that the source qubit (which is not 
affected by these measurements!) is in its initial state even after the interaction with the environment. 
The only result of an increase of this unintentional interaction (as quantified by the r.m.s. magnitude of 
the random phase shift @) is the growth of the probability, 


W =sin’@, (8.175) 


of getting the opposite result, which signals a dephasing-induced error in the source qubit. Such implicit 
measurement, without disturbing the source qubit, is called guantum error detection. 


An even more impressive result may be achieved by the last component of the circuit, the so- 
called Toffoli (or “CCNOT”) gate, denoted by the rightmost symbol in Fig. 7b. This three-qubit gate is 
conceptually similar to the CNOT gate discussed above, besides that it flips the basis state of its target 
qubit only if both source qubits are in state 1. (In the circuit shown in Fig. 7b, the former role is played 


79 Such state is also the 3-qubit example of the so-called Greeenberger-Horne-Zeilinger (GHZ) states, which are 
frequently called the “most entangled” states of a system of N> 2 qubits. 

80 Let me emphasize again that Eq. (171) is strictly valid only if the interaction with the environment is a pure 
dephasing, i.e. does not include the energy relaxation of the qubit or its thermal activation to the higher-energy 
eigenstate; however, it is a reasonable description of errors in the frequent case when 7) << T). 
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by our source qubit, while the latter role, by the two ancilla qubits.) According to its definition, the 
Toffoli gate does not affect the first parentheses in Eq. (174b), but flips the source qubit’s states in the 
second parentheses, so that for the output three-qubit state F' we get 


|) = (a.|0) + a,|1) )eos g|00) +(a,|0) + a,|1)) isin g|11). (8.176a) 
Obviously, this result may be factored as 
|F) = (@,|0)+ a,|1))(os g|00)+ isin gf11)), (8.176b) 


showing that now the source qubit is again fully unentangled from the ancilla qubits. Moreover, 
calculating the norm squared of the second operand, we get 


(cos (00|- isin y(11)) (cos y|00) + isin g|11))= cos’ g+sin* g=1, (8.177) 


so that the final state of the source qubit exactly coincides with its initial state. This is the famous 
miracle of guantum state correction, taking place “automatically” — without any qubit measurements, 
and for any random phase shift @. 


The circuit shown in Fig. 7b may be further improved by adding Hadamard gate pairs, similar to 
that used for the source qubit, to the ancilla qubits as well. It is straightforward to show that if the 
dephasing is small in the sense that the W given by Eq. (175) is much less than 1, this modified circuit 
may provide a substantial error probability reduction (to ~W’) even if the ancilla qubits are also 
subjected to a similar dephasing and the source qubits, at the same stage — i.e. between the two 
Hadamard gates. Such perfect automatic correction of any error (not only of an inner dephasing of a 
qubit and its relaxation/excitation, but also of the mutual dephasing between qubits) of any used qubit 
needs even more parallelism. The first circuit of that kind, based on nine parallel qubits, which is a 
natural generalization of the circuit discussed above, was invented in 1995 by the same P. Shor. Later, 
five-qubit circuits enabling similar error correction were suggested. (The further parallelism reduction 
has been proved impossible.) 


However, all these results assume that the error correction circuits as such are perfect, i.e. 
completely isolated from the environment. In the real world, this cannot be done. Now the key question 
is what maximum level Wy, of the error probability in each gate (including those in the used error 
correction scheme) can be automatically corrected, and how many qubits with W < Wyax would be 
required to implement quantum computers producing important practical results — first of all, factoring 
of large numbers.®! To the best of my knowledge, estimates of these two related numbers have been 
made only for some very specific approaches, and they are rather pessimistic. For example, using the so- 
called surface codes, which employ many physical qubits for coding an informational one, and hence 
increase its fidelity, Wmin may be increased to a few times 10°, but then we would need ~10* physical 
qubits for the Shor’s algorithm implementation.’? This is very far from what currently looks doable 
using the existing approaches. 


Because of this hard situation, the current development of quantum computing is focused on 
finding at least some problems that could be within the reach of either the existing systems, or their 
immediate extensions, and simultaneously would present some practical interest — a typical example of a 


81 In order to compete with the existing classical factoring algorithms, such numbers should have at least 10° bits. 
82 A. Fowler et al., Phys. Rev. A 86, 032324 (2012). 
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technology in the search for applications. Currently, to the best of my knowledge, all suggested 
problems of this kind address either specially crafted mathematical problems,*? or properties of some 
simple physical systems — such as the molecular hydrogen* or the deuteron (the deuterium’s nucleus, 
i.e. the proton-neutron system).85 In the latter case, the interaction between the qubits of the 
computational system is organized so that the system’s Hamiltonian is similar to that of the quantum 
system of interest. (For this work, quantum simulation is a more adequate name than “quantum 
computation”’.8°) 


Such simulations are pursued by some teams using schemes different from that shown in Fig. 3. 
Of those, the most developed is the so-called adiabatic quantum computation,*®’ which drops the hardest 
requirement of negligible interaction with the environment. In this approach, the qubit system is first 
prepared in a certain initial state, and then is let evolve on its own, with no effort to couple-uncouple 
qubits by external control signals during the evolution.’’ Due to the interaction with the environment, in 
particular the dephasing and the energy dissipation it imposes, the system eventually relaxes to a final 
incoherent state, which is then measured. (This reminds the scheme shown in Fig. 3, with the important 
difference that the transform U should not necessarily be unitary.) From numerous runs of such an 
experiment, the outcome statistics may be revealed. Thus, at this approach the interaction with the 
environment is allowed to play a certain role in the system evolution, though every effort is made to 
reduce it, thus slowing down the relaxation process — hence the word “adiabatic” in the name of this 
approach. This slowness allows the system to exhibit some quantum properties, in particular quantum 
tunneling’? through the energy barriers separating close energy minima in the multi-dimensional space 
of states. This tunneling creates a substantial difference in the finite state statistics from that in purely 
classical systems, where such barriers may be overcome only by thermally-activated jumps over them. 


Due to technical difficulties of the organization and precise control of long-range interaction in 
multi-qubit systems, the adiabatic quantum computing demonstrations so far have been limited to a few 
simple arrays described by the so-called extended quantum Ising (“spin-glass”) model 


H=SJ VEG - S16, (8.178) 
{ii i 

where the curly brackets denote the summation over pairs of close (though not necessarily closest) 

neighbors. Though the Hamiltonian (178) is the traditional playground of phase transitions theory (see, 


83 F, Arute et al., Nature 574, 505 (2019). Note that the claim of the first achievement of “quantum supremacy”, 
made in this paper, refers only to an artificial, specially crafted mathematical problem, and does not change my 
assessment of the current status of this technology. 

84 P, O’Malley et al., Phys. Rev. X 6, 031007 (2016). 

85 E. Dumitrescu et al., Phys. Lett. Lett. 120, 210501 (2018). 

86 To the best of my knowledge, this idea was first put forward by Yuri I. Malin in his book Computable and 
Incomputable published in 1980, i.e. before the famous 1982 paper by Richard Feynman. Unfortunately, since the 
book was in Russian, this suggestion was acknowledged by the international community only much later. 

87 Note that the qualifier “quantum” is important in this term, to distinguish this research direction from the 
classical adiabatic (or “reversible”) computation — see, e.g., SM Sec. 2.3 and references therein. 

88 Recently, some hybrids of this approach with the “usual” scheme of quantum computation have been 
demonstrated, in particular, using some control of inter-bit coupling during the relaxation process — see, e.g., R. 
Barends et al., Nature 534, 222 (2016). 

89 As a reminder, this process was repeatedly discussed in this course, starting from Sec. 2.3. 

90 A quantitative discussion of such jumps may be found in SM Sec. 5.6. 
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e.g., SM Chapter 4), to the best of my knowledge there are not many practically important tasks that 
could be achieved by studying the statistics of its solutions. Moreover, even for this limited task, the 
speed of the largest experimental adiabatic quantum “computers”, with several hundreds of Josephson- 
junction qubits?! is still comparable with that of classical, off-the-shelf semiconductor processors (with 
the dollar cost lower by many orders of magnitude), and no dramatic change of this comparison is 
predicted for realistic larger systems. 


To summarize the current (circa mid-2019) situation with the quantum computation 
development, it faces a very hard challenge of mitigating the effects of unintentional coupling with the 
environment. This problem is exacerbated by the lack of algorithms, beyond Shor’s factoring, that 
would give quantum computation a substantial advantage over the classical competition in solving real- 
world problems, and hence a much broader potential customer base that would provide the field with the 
necessary long-term motivation and resources. So far, even the leading experts in this field abstain from 
predictions on when quantum computation may become a self-supporting commercial technology.% 


There seem to be somewhat better prospects for another application of entangled qubit systems, 
namely to telecommunication cryptography.®? The goal here is more modest: to replace the currently 
dominating classical encryption, based on the public-key RSA code mentioned above, that may be 
broken by factoring very large numbers, with a quantum encryption system that would be fundamentally 
unbreakable. The basis of this opportunity is the measurement postulate and the no-cloning theorem: if a 
message is carried over by a qubit, it is impossible for an eavesdropper (in cryptography, traditionally 
called Eve) to either measure or copy it faithfully, without also disturbing its state. However, as we have 
seen from the discussion of Fig. 7a, state guasi-cloning using entangled qubits is possible, so that the 
issue is far from being simple, especially if we want to use a publicly distributed quantum key, in some 
sense similar to the classical public key used at the RSA encryption. Unfortunately, I would not have 
time/space to discuss various options for quantum encryption, but cannot help demonstrating how 
counter-intuitive they may be, on the famous example of the so-called quantum teleportation (Fig. 8).%* 


Suppose that some party A (in cryptography, traditionally called Alice) wants to send to party B 
(Bob) the full information about the pure quantum state @ of a qubit, unknown to either party. Instead of 
sending her qubit directly to Bob, Alice asks him to send her one qubit (f) of a pair of other qubits, 
prepared in a certain entangled state, for example in the singlet state described by Eq. (11); in our 
current notation 


AB") = — (01) 10). (8.179) 


The initial state of the whole three-qubit system may be represented in the form 


91 See, e.g., R. Harris et al., Science 361, 162 (2018). Similar demonstrations with trapped-ion systems so far have 
been on a smaller scale, with a few tens of qubits — see, e.g., J. Zhang et al., Nature 551, 601 (2017). 

92 See the publication Quantum Computing: Progress and Prospects, The National Academies Press, 2019. 

93 This field was pioneered in the 1970s by S. Wisener. Its important theoretical aspect (which I, unfortunately, 
also will not be able to cover) is the distinguishability of different but close quantum states — for example, of an 
original qubit set, and that slightly corrupted by noise. A good introduction to this topic may be found, for 
example, in Chapter 9 of the monograph by Nielsen and Chuang, cited above. 

°4 This procedure had been first suggested in 1993 by Charles Henry Bennett, and then repeatedly demonstrated 
experimentally — see, e.g., L. Steffen et al., Nature 500, 319 (2013), and literature therein. 
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|@8B") = (a,|0) + a,|1))| 88") = 001) - 010) + 5 010) II 11), (8.180a) 
which may be equivalently rewritten as the following linear superposition, 


|apB') = |ap)" (-a\|0)+a,|1))+—|a8), (a0) +24!) 


i : (8.180b) 
+ Hp)" (-a,|0)+4,|1)}+4]a6)(-as]0)—<y1) 
of the following four states of the qubit pair af: 
an = 5 (00) +[11)) an = (01) 10). (8.181) 


Alice Bob 


[a | (a) Fig. 8.8. Sequential stages of a “quantum 
[a ]}<+ a [ze] (b) teleportation” procedure: (a) ae initial state 
with entangled qubits # and f’, (b) the back 


[ of | [ 2B] (c) transfer of the qubit f, (c) the measurement of 


the pair @f, (d) the forward transfer of two 


[ a | 2 _ 2 bits | [ 2 | (d) classical bits with the measurement results, and 


(e) the final state, with the state of the qubit /’ 


(e) mirroring the initial state of the qubit a. 


After having received qubit £ from Bob, Alice measures which of these four states does the pair 
ap have. This me be achieved, for example, by measurement of one observable represented by the 
operator GIG) and another one corresponding to ENG Q6 \_ of. Eq. (156). (Since all four states (181) 
are eigenstates of both these operators, these two measurements do not affect each other and may be 
performed in any order.) The measured eigenvalue of the former operator enables distinguishing the 
couples of states (181) with different values of the lower index, while the latter measurement 
distinguishes the states with different upper indices. 


Then Alice reports the measurement result (which may be coded with just two classical bits) to 
Bob over a classical communication channel. Since the measurement places the pair af definitely into 
the corresponding state, the remaining Bob’s bit (’ is now definitely in the unentangled single-qubit 
state that is represented by the corresponding parentheses in Eq. (180b). Note that each of these 
parentheses contains both coefficients ao, 1.e. the whole information about the initial state that the qubit 
a had initially. If Bob likes, he may now use appropriate single-qubit operations, similar to those 
discussed earlier in this section, to move his qubit (’ into the state exactly similar to the initial state of 
qubit a. (This fact does not violate the no-cloning theorem (167), because the measurement has already 
changed the state of a.) This is, of course, a “teleportation” only in a very special sense of this term, but 
a good example of the importance of qubit entanglement’s preservation at their spatial transfer. For this 
course, this was also a good primer for the forthcoming discussion of the EPR paradox and Bell’s 
inequalities in Chapter 10. 
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Returning for just a minute to quantum cryptography: since its most common quantum key 
distribution protocols” require just a few simple quantum gates, whose experimental implementation is 
not a large technological challenge, the main focus of the current effort is on decreasing the single- 
photon dephasing in long electromagnetic-wave transmission channels,®© with sufficiently high qubit 
transfer fidelity. The recent progress was rather impressive, with the demonstrated transfer of entangled 
qubits over landlines longer than 100 km,%’ and over at least one satellite-based line longer than 1,000 
km;%8 and also the whole quantum key distribution over a comparable distance, though for now at a very 
low rate yet.9? Let me hope that if not the author of this course, then its readers will see this technology 
used in practical secure telecommunication systems. 


8.6. Exercise problems 
8.1. Prove that Eq. (30) indeed yields Ey" = (5/4)Eu. 


8.2. For a dilute gas of helium atoms in their ground state, with n atoms per unit volume, 
calculate its: 


(1) electric susceptibility v., and 
(ii) magnetic susceptibility 7m, 


and compare the results. 


Hint: You may use the model solution of Problems 6.8 and 6.14, and the results of the variational 
description of the helium atom’s ground state in Sec. 2. 


8.3. Calculate the expectation values of the following observables: $182, S° = (s; + $2)”, and S; = 
Siz + S2:, for the singlet and triplet states of the system of two spins-'4, defined by Eqs. (18) and (21), 
directly, without using the general Eq. (48). Compare the results with those for the system of two 
classical geometric vectors of length //2 each. 


8.4. Discuss the factors +1/V2 that participate in Eqs. (18) and (20) for the entangled states of the 
system of two spins-'4, in terms of Clebsh-Gordan coefficients similar to those discussed in Sec. 5.7. 


8.5. Use the perturbation theory to calculate the contribution into the so-called hyperfine 
splitting of the ground energy of the hydrogen atom,! due to the interaction between the spins of its 
nucleus (proton) and electron. 


°> Two of them are the BB84 suggested in 1984 by C. Bennett and G. Brassard, and the EPRBE suggested in 
1991 by A. Ekert. For details, see, e.g., either Sec. 12.6 in the repeatedly cited monograph by Nielsen and 
Chuang, or the review by N. Gizin et al., Rev. Mod. Phys. 74, 145 (2002). 

6 For their quantitative discussion see, e.g., EM Sec. 7.8. 

97 See, e.g., T. Herbst et al., Proc. Nath. Acad. Sci. 112, 14202 (2015), and references therein. 

98 J. Yin et al., Science 356, 1140 (2017). 

99 H.-L. Yin et al., Phys. Rev. Lett. 117, 190501 (2016). 

100 This effect was discovered experimentally by A. Michelson in 1881 and explained theoretically by W. Pauli in 
1924. 
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Hint: The proton’s magnetic moment operator is described by the same Eq. (4.115) as the 
electron, but with a positive gyromagnetic factor % = gpe/2mp ~ 2.675x10° s'T', whose magnitude is 
much smaller than that of the electron (|y| ~ 1.761x10'' s'T"'), due to the much higher mass, My © 
1.673x107’ kg = 1,835 me. (The g-factor of the proton is also different, Bp ® 5.586.191) 


8.6. In the simple case of just two similar spin-interacting particles, distinguishable by their 
spatial location, the famous Heisenberg model of ferromagnetism!® is reduced to the following 
Hamiltonian: 

H=—J8,-8,-yB-(8, +8,), 


where J is the spin interaction constant, y is the gyromagnetic ratio of each particle, and & is the 
external magnetic field. Find the stationary states and energies of this system for spin-’4 particles. 


8.7. Two particles, both with spin-’2 but different gyromagnetic ratios 7, and 7%, are placed to 
external magnetic field &. In addition, their spins interact as in the Heisenberg model: 


Ha eae 


int 


Find the eigenstates and eigenenergies of the system. 


8.8. Two similar spin-/2 particles, with gyromagnetic ratio y, localized at two points separated by 
distance a, interact via the field of their magnetic dipole moments. Calculate stationary states and 
energies of the system. 


8.9. Consider the permutation of two identical particles, each of spin s. How many different 
symmetric and antisymmetric spin states can the system have? 


8.10. For a system of two identical particles with s = 1: 


(i) List all spin states forming the uncoupled-representation basis. 

(ii) List all possible pairs {S, Ms} of the quantum numbers describing the states of the coupled- 
representation basis — see Eq. (48). 

(iii) Which of the {S, Ms} pairs describe the states symmetric, and which the states 
antisymmetric, with respect to the particle permutation? 


8.11. Represent the operators of the total kinetic energy and the total orbital angular momentum 
of a system of two particles, with masses m, and m2, as combinations of terms describing the center-of- 
mass motion and the relative motion. Use the results to calculate the energy spectrum of the so-called 


101 The anomalously large value of the proton’s g-factor results from the composite quark-gluon structure of this 
particle. (An exact calculation of g, remains a challenge for quantum chromodynamics.) 

102 It was suggested in 1926, independently by W. Heisenberg and P. Dirac. A discussion of thermal motion 
effects on this and other similar systems (especially the Ising model of ferromagnetism) may be found in SM 
Chapter 4. 
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positronium — a metastable “atom”! consisting of one electron and its positively charged antiparticle, 
the positron. 


8.12. Two particles with similar masses » and charges q are free to move along a round, plane 
ring of radius R. In the limit of strong Coulomb interaction of the particles, find the lowest eigenenergies 
of the system, and sketch the system of its energy levels. Discuss possible effects of particle 
indistinguishability. 


8.13. Low-energy spectra of many diatomic molecules may be well described by modeling the 
molecule as a system of two particles connected with a light and elastic, but very stiff spring. Calculate 
the energy spectrum of a molecule within this model. Discuss possible effects of nuclear spins on 


spectra of the so-called homonuclear diatomic molecules, formed by two similar atoms. 
8.14. Two indistinguishable spin-”2 particles are attracting each other at contact: 
U(x, oXs ) = —~w6(x, =X5 ), with W > 0, 


but are otherwise free to move along the x-axis. Find the energy and the orbital wavefunction of the 
ground state of the system. 


8.15. Calculate the energy spectrum of the system of two identical spin-/% particles, moving 
along the x-axis, which is described by the following Hamiltonian: 


a2 m2 2 
n Mm,Q, 
ao Pi " P2 Mos 


(x? +X5 + &,x, ), 
2m, 2m, 2 


and the degeneracy of each energy level. 


8.16. Two indistinguishable spin-/2 particles are confined to move around a circle of radius R, 
and interact only at a very short arc distance / = Rg = R(g — g2) between them, so that the interaction 
potential U may be well approximated with a delta function of g. Find the ground state and its energy, 
for the following two cases: 


(i) the “orbital” (spin-independent) repulsion: U= W5(9), 
(11) the spin-spin interaction: U =-w S, -§,6(9), 


both with constant W > 0. Analyze the trends of your results in the limits W— 0 and W—-> o. 


8.17. Two particles of mass M, separated by two much lighter particles of mass 
m << M, are placed on a ring of radius R — see the figure on the right. The particles 
strongly repulse at contact, but otherwise, each of them is free to move along the ring. 
Calculate the lower part of the energy spectrum of the system. 


103 Tts lifetime (either 0.124 ns or 138 ns, depending on the parallel or antiparallel configuration of the component 
spins), is limited by the weak interaction of its components, which causes their annihilation with the emission of 
several gamma-ray photons. 
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8.18. N indistinguishable spin-’2 particles move in a spherically-symmetric quadratic potential 
U(r) = mayr’/2. Neglecting the direct interaction of the particles, find the ground-state energy of the 
system. 


8.19. Use the Hund rules to find the values of the quantum numbers L, S, and J in the ground 
states of the atoms of carbon and nitrogen. Write down the Russell-Saunders symbols for these states. 


8.20. N >> 1 indistinguishable, non-interacting quantum particles are placed in a hard-wall, 
rectangular box with sides a,, ay, and a;. Calculate the ground-state energy of the system, and the 
average forces it exerts on each face of the box. Can we characterize the forces by certain pressure ?? 


Hint: Consider separately the cases of bosons and fermions. 


8.21.” Explore the Thomas-Fermi model of a heavy atom, with the nuclear charge Q = Ze >> 
e, in which the interaction between electrons is limited to their contribution to the common electrostatic 
potential Ar). In particular, derive the ordinary differential equation obeyed by the radial distribution of 
the potential, and use it to estimate the effective radius of the atom. 


8.22.” Use the Thomas-Fermi model, explored in the previous problem, to calculate the total 
binding energy of a heavy atom. Compare the result with that for the simpler model, in that the Coulomb 
electron-electron interaction is completely ignored. 


8.23. A system of three similar spin-’ particles is described by the Heisenberg Hamiltonian (cf. 
Problems 6 and 7): ; 
‘ef =-J(8, -§, +8, -8; +8, -,), 


where J is the spin interaction constant. Find the stationary states and energies of this system, and give 
an interpretation of your results. 


8.24. For a system of three spins-’2, find the common eigenstates and eigenvalues of the 
operators S and S”, where 


S=§, +8, +8, 
is the vector operator of the total spin of the system. Do the corresponding quantum numbers S and Ms 
obey Eqs. (48)? 


8.25. Explore basic properties of the Heisenberg model (which was the subject of Problems 6, 7, 
and 23), for a 1D chain of N spins-: 


H=-J>8,-8,-7yB->8,, with J>0, 
J} j 

where the summation is over all NV spins, with the symbol {/, 7’} meaning that the first sum is only over 

the adjacent spin pairs. In particular, find the ground state of the system and its lowest excited states in 

the absence of external magnetic field &, and also the dependence of their energies on the field. 


104 Tt was suggested in 1927, independently, by L. Thomas and E. Fermi. 
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Hint: For the sake of simplicity, you may assume that the first sum includes the term §,,-$, as 
well. (Physically, this means that the chain is bent into a closed loop. !°) 


8.26. Compose the simplest model Hamiltonians, in terms of the second quantization formalism, 
for systems of indistinguishable particles moving in the following external potentials: 


(i) two weakly coupled potential wells, with on-site particle interactions (giving additional 
energy J per each pair of particles in the same potential well), and 
(ii) a periodic 1D potential, with the same particle interactions, in the tight-binding limit. 


8.27. For each of the Hamiltonians composed in the previous problem, derive the Heisenberg 
equations of motion for particle creation/annihilation operators: 


(1) for bosons, and 
(11) for fermions. 


8.28. Express the ket-vectors of all possible Dirac states for the system of three indistinguishable 


(1) bosons, and 
(11) fermions, 


via those of the single-particle states Z, £’, and B” they occupy. 


8.29. Explain why the general perturbative result (8.126), when applied to the “He atom, gives 
the correct! expression (8.29) for the ground singlet state, and correct Eqs. (8.39)-(8.42) (with the 
minus sign in the first of these relations) for the excited triplet states, but cannot describe these results, 
with the plus sign in Eq. (8.39), for the excited singlet state. 


8.30. For a system of two distinct qubits (i.e. two-level systems), introduce a reasonable 
uncoupled-representation z-basis, and write in this basis the 4x4 matrix of the operator that swaps their 
states. 


.31. Find a time-independent Hamiltonian that can cause the qubit evolution described by Eqs. 
(155). Discuss the relation between your result and the time-dependent Hamiltonian (6.86). 


105 Note that for dissipative spin systems, differences between low-energy excitations of open-end and closed-end 
1D chains may be substantial even in the limit N >  — see, e.g., SM Sec. 4.5. However, for our Hamiltonian 
(and hence dissipation-free) system, the differences are relatively small. 

106 Correct in the sense of the first order of the perturbation theory. 
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Chapter 9. Introduction to Relativistic Quantum Mechanics 


The brief introduction to relativistic quantum mechanics, presented in this chapter, consists of two very 
different parts. Its first part is a discussion of the basic elements of the quantum theory of the 
electromagnetic field (usually called quantum electrodynamics, QED), including the field quantization 
scheme, photon statistics, radiative atomic transitions, the spontaneous and stimulated radiation, and 
so-called cavity QED. We will see, in particular, that the QED may be considered as the relativistic 
quantum theory of particles with zero rest mass — photons. The second part of the chapter is a_ brief 
review of the relativistic quantum theory of particles with non-zero rest mass, including the Dirac 
theory of spin-% particles. These theories mark the point of entry into a more complete relativistic 
quantum theory — the quantum field theory — which is beyond the scope of this course. ! 


9.1. Electromagnetic field quantization? 


Classical physics gives us? the following general relativistic relation between the momentum p 
and energy E of a free particle with rest mass m, which may be simplified in two limits — non-relativistic 
and ultra-relativistic: 


2 2 
E- (pe) + (me?)?}” = {me + p°/2m, for p << mc, (0.1) 


PC, for p >> mc. 


In both limits, the transfer from classical to quantum mechanics is easier than in the arbitrary case. Since 
all the previous part of this course was committed to the first, non-relativistic limit, I will now jump to a 
brief discussion of the ultra-relativistic limit p >> mc, for a particular but very important system — the 
electromagnetic field. Since the excitations of this field, called photons, are currently believed to have 
zero rest mass m,‘ the ultra-relativistic relation E = pc is exactly valid for any photon energy £, and the 
quantization scheme is rather straightforward. 


As usual, the quantization has to be based on the classical theory of the system — in this case, the 
Maxwell equations. As the simplest case, let us consider the electromagnetic field inside a finite free- 
space volume limited by ideal walls, which reflect incident waves perfectly. Inside the volume, the 
Maxwell equations give a simple wave equation® for the electric field 

2 
ve-+2 = 0, (9.2) 
co 


' Note that some material covered in this chapter is frequently taught as a part of the quantum field theory. I will 
focus on the most important results that may be obtained without starting the heavy engines of that theory. 

2 The described approach was pioneered by the same P. A. M. Dirac as early as 1927. 

3 See, e.g., EM Chapter 9. 

4 By now this fact has been verified experimentally with an accuracy of at least ~10°* m. — see S. Eidelman et al., 
Phys. Lett. B 592, 1 (2004). 

5 In the case of finite energy absorption in the walls, or in the wave propagation media (say, described by complex 
constants ¢ and zw), the system is not energy-conserving (Hamiltonian), i.e. interacts with some dissipative 
environment. Specific cases of such interaction will be considered in Sections 2 and 3 below. 

6 See, e.g., EM Eq. (7.3), for the particular case ¢ = &, 1 = J, So that v’ = 1/eu= 1/ap =’. 
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and an absolutely similar equation for the magnetic field @. We may look for the general solution of Eq. 
(2) in the variable-separating form 


&(r,t) = = p, (te, (r). (9.3) 


Physically, each term of this sum is a standing wave whose spatial distribution and polarization 
(“mode”) are described by the vector function er), while the temporal dynamics, by the function p((‘). 
Plugging an arbitrary term of this sum into Eq. (2), and separating the variables exactly as we did, for 
example, in the Schrédinger equation in Sec. 1.5, we get 
Ve a 
J a oie ee ae (9.4) 


> 30 J 
ej ae 


so that the spatial distribution of the mode satisfies the 3D Helmholtz equation: 


2 2 Equation 
Ve. +k-e, =0. (9.5) _ for spatial 
is distribution 


The set of solutions of this equation, with appropriate boundary conditions, determines the set of the 
functions e;, and simultaneously the spectrum of the wave number magnitudes 4;. The latter values 
determine the mode eigenfrequencies, following from Eq. (4): 


B, +O; DP; =0, with 0, =k c. (9.6) 


There is a big philosophical difference between the quantum-mechanical approach to Eqs. (5) 
and (6), despite their single origin (4). The first (Helmholtz) equation may be rather difficult to solve in 
realistic geometries,’ but it remains intact in the basic quantum electrodynamics, with the scalar 
components of the vector functions er) still treated (at each point r) as c-numbers. In contrast, the 
classical Eq. (6) is readily solvable (giving sinusoidal oscillations with frequency @;), but this is exactly 
where we can make the transfer to quantum mechanics, because we already know how to quantize a 
mechanical 1D harmonic oscillator, which in classics obeys the same equation. 


As usual, we need to start with the appropriate Hamiltonian — the operator corresponding to the 
classical Hamiltonian function H of the proper set of generalized coordinates and momenta. The 
electromagnetic field’s Hamiltonian function (which in this case coincides with the field’s energy) is® 


&? J 2 
H=fa'’r cara (9.7) 
2 2 Ly 


Let us represent the magnetic field in a form similar to Eq. (3),? 


7 See, e.g., various problems discussed in EM Chapter 7, especially in Sec. 7.9. 

8 See, e.g., EM Sec. 9.8, in particular, Eq. (9.225). Here I am using SI units, with a4 = c”’; in the Gaussian units, 
the coefficients & and so disappear, but there is an additional common factor 1/4z in the equation for energy. 
However, if we modify the normalization conditions (see below) accordingly, all the subsequent results, starting 
from Eq. (10), look similar in any system of units. 

° Here I am using the letter g, instead of x;, for the generalized coordinate of the field oscillator, in order to 
emphasize the difference between the former variable, and one of the Cartesian coordinates, i.e. one of the 
arguments of the c-number functions e and b. 
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Bir,t) = pa oq, (t)b ,(r). (9.8) 


Since, according to the Maxwell equations, in our case the magnetic field satisfies the equation similar 
to Eq. (2), the time-dependent amplitude g; of each of its modes b(1r) obeys an equation similar to Eq. 
(6), i.e. in the classical theory also changes in time sinusoidally, with the same frequency @. Plugging 
Eqs. (3) and (8) into Eq. (7), we may recast it as 

O75 


2 
a ral eye; (W)d'r + | +63 (r)d*r |. ve) 
F Ho 


Since the distribution of constant factors between two multiplication operands in each term of Eq. (3) is 
so far arbitrary, we may fix it by requiring the first integral in Eq. (9) to equal 1. It is straightforward to 
check that according to the Maxwell equations, which give a specific relation between vectors & and 
B,'° this normalization makes the second integral in Eq. (9) equal 1 as well, and Eq. (9) becomes 


2 Ded: 
H=)'H,, eee ees (9.10a) 
J 


2 2 
Note that that p; is the legitimate generalized momentum corresponding to the generalized coordinate qj, 
because it is equal to 0OL/0q,, where L is the Lagrangian function of the field — see EM Eq. (9.217): 


@2 Re? : oO” 2 
L=(d°r Eo BE = dog | ee ead (9.10b) 
2 2 Lp oe i: 2 


Hence we can carry out the standard quantization procedure, namely declare Hj, p;, and gq; the 
quantum-mechanical operators related exactly as in Eq. (10a), 


(9.11) 


We see that this Hamiltonian coincides with that of a 1D harmonic oscillator with the mass m; formally 
equal to 1,!! and the eigenfrequency equal to @. However, in order to use Eq. (11) in the general Eq. 
(4.199) for the time evolution of Heisenberg-picture operators p, andq,, we need to know the 


commutation relation between these operators. To find them, let us calculate the Poisson bracket (4.204) 
for the functions A = g; and B = p;”, taking into account that in the classical Hamiltonian mechanics, all 
generalized coordinates qg; and the corresponding momenta p; have to be considered independent 
arguments of H, only one term (with 7 = 7’ =”) in only one of the sums (12) (namely, with 7’ =/”’), 
gives a non-zero value (-1), so that 


04 Op» OG» Op » 
{dP}, = =—S jy. (9.12) 


Hence, according to the general quantization rule (4.205), the commutation relation of the operators 
corresponding to q; and p; is 


10 See, e.g., EM Eq. (7.6). 
'l Selecting a different normalization of the functions e(r) and br), we could readily arrange any value of m,, 
and the choice corresponding to m; = | is the best one just for the notation simplicity. 
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[4 By» |= iS yn (9.13) 
i.e. is exactly the same as for the usual Cartesian components of the radius-vector and momentum of a 
mechanical particle — see Eq. (2.14). 


As the reader already knows, Eqs. (11) and (13) open for us several alternative ways to proceed: 


(i) Use the Schrédinger-picture wave mechanics based on wavefunctions ‘Y¥(q;, 0). As we know 
from Sec. 2.9, this way is inconvenient for most tasks, because the eigenfunctions of the harmonic 
oscillator are rather clumsy. 


(11) A substantially better way (for the harmonic oscillator case) is to write the equations of the 
time evolution of the operators q,(¢) and p(t) in the Heisenberg picture of quantum dynamics. 

(iii) An even more convenient approach is to use equations similar to Eqs. (5.65) to decompose 
the Heisenberg operators q ,(¢) and p(¢) into the creation-annihilation operators at (t) and a, (t), and 
work with these operators. 


In this chapter, I will mostly use the last route. Replacing m with m; =1, and @ with @, the last 
forms of Eqs. (5.65) become 


oO 1/2 Pp o 1/2 Pp 
Gel) eae aS leas 9.14 
/ | [s, 7 ’ Ey 4 7 


Due to Eq. (13), the creation-annihilation operators obey the commutation similar to Eq. (5.68), 


[4,,at | =16,. (9.15) 


As a result, according to Eqs. (3) and (8), the quantum-mechanical operators of the electric and 
magnetic fields are sums over all field oscillators: 


(r,t) = x Gilal=e (9.16a) 
(9.16b) 
(9.17) 


absolutely similar to Eq. (5.72) for a mechanical oscillator. 


Now comes a very important conceptual step. From Sec. 5.4 we know that the eigenfunctions 
(Fock states) n; of the Hamiltonian (17) have energies 


(9.18) 


and, according to Eq. (5.89), the operators a; and a, act on the eigenkets of these partial states as 


J 
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n,)=(n,)'"|n,-1), — af|n,)=(n, +1)'7|n, +1), (9.19) 


regardless of the quantum states of other modes. These rules coincide with the definitions (8.64) and 
(8.68) of bosonic creation-annihilation operators, and hence their action may be considered as the 
creation/annihilation of certain bosons. Such a “particle” (actually, an excitation, with energy ha, of an 
electromagnetic field oscillator) is exactly what is, strictly speaking, called a photon. Note immediately 
that according to Eq. (16), such an excitation does not change the spatial distribution of the ia mode of 
the field. So, such a “global” photon is an excitation created simultaneously at all points of the field 
confinement region. 


If this picture is too contrary to the intuitive image of a particle, please recall that in Chapter 2, 
we discussed a similar situation with the fundamental solutions of the Schrédinger equation of a free 
non-relativistic particle: they represent sinusoidal de Broglie waves existing simultaneously in all points 
of the particle confinement region. The (partial :-) reconciliation with the classical picture of a moving 
particle might be obtained by using the linear superposition principle to assemble a quasi-localized wave 
packet, as a group of sinusoidal waves with close wave numbers. Very similarly, we may form a similar 
wave packet using a linear superposition of the “global” photons with close values of k; (and hence @), 
to form a quasi-localized photon. An additional simplification here is that the dispersion relation for 
electromagnetic waves (at least in free space) is linear: 

0a, Oa, 


/—c=const, ie. -=0, (9.20) 
Ok, Ok, 


so that, according to Eq. (2.39a), the electromagnetic wave packets (i.e. space-localized photons) do not 
spread out during their propagation. Note also that due to the fundamental classical relations p = nE/c 
for the linear momentum of the traveling electromagnetic wave packet of energy E, propagating along 
the direction n = k/k, and L = tn&£/q, for its angular momentum,!? such photon may be prescribed the 
linear momentum p = niq@/c = hk and the angular momentum L = tnfh, with the sign depending on the 
direction of its circular polarization (“helicity”). 


This electromagnetic field quantization scheme should look very straightforward, but it raises an 
important conceptual issue of the ground state energy. Indeed, Eq. (18) implies that the total ground- 
state (i.e., the lowest) energy of the field is 


(9.21) 


Since for any realistic model of the field-confining volume, either infinite or not, the density of 
electromagnetic field modes only grows with frequency,!? this sum diverges on its upper limit, leading 
to infinite ground-state energy per unit volume. This infinite-energy paradox cannot be dismissed by 
declaring the ground-state energy of field oscillators unobservable, because this would contradict 
numerous experimental observations — starting perhaps from the famous Casimir effect.'4 The 


12 See, e.g., EM Sections 7.7 and 9.8. 

13 See, e.g., Eq. (1.1), which is similar to Eq. (1.90) for the de Broglie waves, derived in Sec. 1.7. 

14 This effect was predicted in 1948 by Hendrik Casimir and Dirk Polder, and confirmed semi-quantitatively in 
experiments by M. Sparnaay, Nature 180, 334 (1957). After this, and several other experiments, a decisive error 
bar reduction (to about ~5%), providing a quantitative confirmation of the Casimir formula (23), was achieved by 
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conceptually simplest implementation of this effect involves two parallel, perfectly conducting plates of 
area A, separated by a vacuum gap of thickness ¢ << A’? (Fig. 1). 


RMQQAAQAA 
t Fig. 9.1. The simplest geometry of 


RK the Casimir effect manifestation. 


Rather counter-intuitively, the plates attract each other with a force F proportional to the area A 
and rapidly increasing with the decrease of ¢, even in the absence of any explicit electromagnetic field 
sources. The effect’s explanation is that the energy of each electromagnetic field mode, including its 
ground-state energy, exerts average pressure, 


OE, 


(P,)=-—E, (9.22) 


on the walls constraining it to volume V. While the field’s pressure on the external surfaces on the plates 
is due to the contributions (22) of all free-space modes, with arbitrary values of k, (the z-component of 
the wave vector k,), in the gap between the plates the spectrum of k, is limited to the multiples of z/t, so 
that the pressure on the internal surfaces is lower. This is why the net force exerted on the plates may be 
calculated as the sum of the contributions (22) from all “missing” low-frequency modes in the gap, with 
the minus sign. In the simplest model when the plates are made of an ideal conductor, which provides 
boundary conditions &, = 4, = 0 on their surfaces,!> such calculation is quite straightforward (and is 
hence left for the reader’s exercise), and its result is 


(9.23) 


Note that for such calculation, the high-frequency divergence of Eq. (21) is not important, 
because it participates in the forces exerted on all surfaces of each plate, and cancels out from the net 
pressure. In this way, the Casimir effect not only confirms Eq. (21), but also teaches us an important 
lesson on how to deal with the divergences of such sums at w; — ©. The lesson is: just get accustomed 
to the idea that the divergence exists, and ignore this fact while you can, i.e. if the final result you are 
interested in is finite. However, for some more complex problems of quantum electrodynamics (and the 


S. Lamoreaux, Phys. Rev. Lett. 78, 5 (1997) and by U. Mohideen and A. Roy, Phys. Rev. Lett. 81, 004549 (1998). 
Note also that there are other experimental confirmations of the reality of the ground-state electromagnetic field, 
including, for example, the experiments by R. Koch et al. already discussed in Sec. 7.5, and the recent spectacular 
direct observations by C. Riek et al., Science 350, 420 (2015). 

'5 For realistic conductors, the reduction of t below ~1 pm causes significant deviations from this simple model, 
and hence from Eq. (23). The reason is that for gaps so narrow, the depth of field penetration into the conductors 
(see, e.g., EM Sec. 6.2), at the important frequencies w ~ c/t, becomes comparable with ¢, and an adequate theory 
of the Casimir effect has to involve a certain model of the penetration. (It is curious that in-depth analyses of this 
problem, pioneered in 1956 by E. Lifshitz, have revealed a deep relation between the Casimir effect and the 
London dispersion force which was the subject of Problems 3.16, 5.15, and 6.18 — for a review see, e.g., either I. 
Dzhyaloshinskii et al., Sov. Phys. Uspekhi 4, 153 (1961), or K. Milton, The Casimir Effect, World Scientific, 
2001. Recent experiments in the 100 nm — 2 um range of ¢, with an accuracy better than 1%, have allowed not 
only to observe the effects of field penetration on the Casimir force, but even to make a selection between some 
approximate models of the penetration — see D. Garcia-Sanchez et al., Phys. Rev. Lett. 109, 027202 (2012). 
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quantum theory of any other fields), this simplest approach becomes impossible, and then more 
complex, renormalization techniques become necessary. For their study, I have to refer the reader to a 
quantum field theory course — see the references at the end of this chapter. 


9.2. Photon absorption and counting 


As a matter of principle, the Casimir effect may be used to measure quantum effects in not only 
the free-space electromagnetic field but also that the field arriving from active sources — lasers, etc. 
However, usually such studies may be done by simpler detectors, in which the absorption of a photon by 
a single atom leads to its ionization. This ionization, i.e. the emission of a free electron, triggers an 
avalanche reaction (e.g., an electric discharge in a Geiger-type counter), which may be readily registered 
using appropriate electronic circuitry. In good photon counters, the first step, the “trigger” atom 
ionization, is the bottleneck of the whole process (the photon count), so that to analyze their statistics, it 
is sufficient to consider the field’s interaction with just this atom. 


Its ionization is a quantum transition from a discrete initial state of the atom to its final, ionized 
state with a continuous energy spectrum, induced by an external electromagnetic field. This is exactly 
the situation shown in Fig. 6.12, so we may apply to it the Golden Rule of quantum mechanics in the 
form (6.149), with the system a associated with the electromagnetic field, and system b with the trigger 
atom. The atom’s size is typically much smaller than the radiation wavelength, so that the field-atom 
interaction may be adequately described in the electric dipole approximation (6.146) 


=-&-d, (9.24) 


where d is the dipole moment’s operator. Hence we may associate this operator with the operand B in 


Eqs. (6.145)-(6.149), while the electric field operator & is associated with the operand A in those 
relations. First, let us assume that our field consists of only one mode er) of frequency w. Then we can 
keep only one term in the sum (16a), and drop the index /, so that Eq. (6.149) may be rewritten as 
T= “7 \ fin 
h 
h 2 


&(r,0)| ini) K fin|d(r)-n,| ini)| P, 
n fat) ~ a(t) etn ini) 


where n, = e(r)/e(r) is the local direction of the vector e(1r), symbols “ini” and “fin” denote the initial 
and final states of the corresponding system (the electromagnetic field in the first long bracket, and the 
atom in the second bracket), and the density , of the continuous atomic states should be calculated at its 
final energy Effin = Eni + ho. 


. : ' (9.25) 
( fin|d(e)-n,| ini)] p,. 


As a reminder, in the Heisenberg picture of quantum dynamics, the initial and final states are 
time-independent, while the creation-annihilation operators are functions of time. In the Golden Rule 
formula (25), as in any perturbative result, this time dependence has to be calculated ignoring the 
perturbation — in this case, the field-atom interaction. For the field’s creation-annihilation operators, this 
dependence coincides with that of the usual 1D oscillator — see Eq. (5.141), in which @ should be, in 
our current notation, replaced with a: 


a(t) = a(0)e "ality =al et. (9.26) 
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Hence Eq. (25) becomes 


2 A 2 
[=zo ( fin fin|d(¢) : n,| ini) Ps (9.27a) 


[at oye ~ a(0)e 1" feo) ini) 


Now let us multiply the first long bracket by exp {iat}, and the second one by exp {-iat}: 


2 


T= ro( fin} at (ye = 4(0) e(r)| ini) |( fin|d(¢) ne ‘| ini) Pa (9.27b) 


This, mathematically equivalent form of the previous relation shows more clearly that at resonant 
photon absorption, only the annihilation operator gives a significant time-averaged contribution to the 
first bracket matrix element. (As a reminder, the quantum-mechanical Golden Rule for time-dependent 
perturbations is a result of averaging over a time interval much larger than 1/@— see Sec. 6.6.) Similarly, 
according to Eq. (4.199), the Heisenberg operator of the dipole moment, corresponding to the increase 
of atom’s energy by fia, has the Fourier components that differ in frequency from @ only by ~T << a, 
so that its time dependence virtually compensates the additional factor in the second bracket of Eq. 
(27b), and this bracket also may have a substantial time average. Hence, in the first bracket we may 
neglect the fast-oscillating term, whose average over time interval ~1/T is very close to zero.!¢ 


Now let us assume, first, that we use the same detector, characterized by the same matrix 
element of the quantum transition, i.e. the same second bracket in Eq. (27), and the same final state 
density ~,, for measurement of various electromagnetic fields — or just of the same field at different 
points r. Then we are only interested in the behavior of the first, field-related bracket, and may write 

r ‘Jet 
Pe \ fin |Ge(r)|ini)} = ( fin 


de(r)| ini)( fin | de(r)| ini) = ( ini|ate" (r) fin)( fin 


Ge(r)| ini), (9.28) 


where the creation-annihilation operators are implied to be taken at ¢ = 0, i.e. in the Schrédinger picture, 
and the initial and final states are those of the field alone. Second, let us now calculate the total rate of 
transitions to all available final states of the given mode e(r). If such states formed a full and 
orthonormal set, we could use the closure relation (4.44), applied to the final states, to write 


2 
> 


P oc 7( ini|a'e"(r)| fin)( fin|de(r)| ini) = ( ini|a*a| ini) e* (r)e(r) = (n), 


e(r)|, | (9.29) 


ini 


fin 


+ 


where, for a given field mode, (7)ini is the expectation value of the operator n=d' 4d for the initial state 


of the electromagnetic field. In the more realistic case of fields in relatively large volumes, V>> 4’, with 
their virtually continuous spectrum of final states, the middle equality in this relation is not strictly valid, 
but it is correct to a constant multiplier,!7 which we are currently not interested in. Note, however, that 
Eq. (29) may be substantially wrong for high-O electromagnetic resonators (“cavities”), which may 
make just one (or a few) modes available for transitions. (Quantum electrodynamics of such cavities will 
be briefly discussed in Sec. 4 below.) 


Let us apply Eq. (29) to several possible quantum states of the mode. 


16 This is essentially the same rotating wave approximation (RWA), which was already used in Sec. 6.5 and 
beyond — see, e.g., the transition from Eq. (6.90) to the first of Eqs. (6.94). 


'7 As the Golden Rule shows, this multiplier is proportional to the density p, of the final states of the 
field. 
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(i) First, as a sanity check, the ground initial state, n = 0, gives no photon absorption at all. The 
interpretation is easy: the ground state field, cannot emit a photon that would ionize an atom in the 
counter. Again, this does not mean that the ground-state “motion” is not observable (if you still think so, 
please review the Casimir effect discussion in Sec. 1), just that it cannot ionize the trigger atom — 
because it does not have any spare energy for doing that. 


(ii) All other coherent states (Fock, Glauber, squeezed, etc.) of the field oscillator give the same 
counting rate, provided that their (7)ini is the same. This result may be less evident if we apply Eq. (29) 
to the interference of two light beams from the same source — say, in the double-slit or the Bragg- 
scattering configurations. In this case, we may represent the spatial distribution of the field as a sum 


e(r) =e,(r) +e, (r). (9.30) 


Here each term describes one possible wave path, so that the operator product in Eq. (29) may be a 
rapidly changing function of the detector position. For this configuration, our result (29) means that the 
interference pattern (and its contrast) are independent of the particular state of the electromagnetic 
field’s mode. 


(iii) Surprisingly, the last statement is also valid for a classical mixture of the different 
eigenstates of the same field mode, for example for its thermal-equilibrium state. Indeed, in this case we 
need to average Eq. (29) over the corresponding classical ensemble, but it would only result in a 
different meaning of averaging n in that equation; the field part describing the interference pattern is not 
affected. 


The last result may look a bit counter-intuitive because common sense tells us that the 
stochasticity associated with thermal equilibrium has to suppress the interference pattern contrast. These 
expectations are (partly :-) justified because a typical thermal source of radiation produces many field 
modes /, rather than one mode we have analyzed. These modes may have different wave numbers 4; and 
hence different field distribution functions er), resulting in shifted interference patterns. Their 
summation would indeed smear the interference, suppressing its contrast. 


So the use of one photon detector is not the best way to distinguish different quantum states of an 
electromagnetic field mode. This task, however, may be achieved using the photon counting correlation 
technique shown in Fig. 2.8 


controllable 
semi-transparent delay 


mirror 


Vv 
$e 
light 


SOHISe detector 2 ey Rae detector 1 


count 
statistics Fig. 9.2. Photon count 
calculation correlation measurement. 


18 Tt was pioneered as early as the mid-1950s (i.e. before the advent of lasers), by Robert Hanbury Brown and 
Richard Twiss. Their second experiment was also remarkable for the rather unusual light source — the star Sirius! 
(Their work was an effort to improve astrophysics interferometry techniques.) 
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In this experiment, the counter rate correlation may be characterized by the so-called second- 
order correlation function of the counting rates, 


(T, 1, (¢-7)) 


gO(r)= (9.31) 


(T,O)(0,0) 
where the averaging may be carried out either over many similar experiments, or over a relatively long 
time interval ¢>> 7, with usual field sources — due to their ergodicity. Using the normalized correlation 


function (31) is very convenient because the characteristics of both detectors and the beam splitter (e.g., 
a semi-transparent mirror, see Fig. 2) drop out from this fraction. 


Very unexpectedly for the mid-1950s, Hanbury Brown and Twiss discovered that the correlation 
function depends on time delay 7 in the way shown (schematically) with the solid line in Fig. 3. It is 
evident from Eq. (31) that if the counting events are completely independent, g®(z) should be equal to 1 
— which is always the case in the limit z — o. (As will be shown in the next section, the characteristic 
time of this approach is usually between 10's and 10°s, so that for its measurement, the delay time 
control may be provided just by moving one of the detectors by a human-scale distance between a few 
millimeters to a few meters.) Hence, the observed behavior at rt > 0 corresponds to a positive 
correlation of detector counts at small time delays, i.e. to a higher probability of the nearly simultaneous 
arrival of photons to both counters. This counter-intuitive effect is called photon bunching. 


Fig. 9.3. Photon bunching (solid line) and 
antibunching for various n (dashed lines). The 
lines approach level g” = 1 at r — (on the 


time scale depending on the light source). 


Let us use our simple single-mode model to analyze this experiment. Now the elementary 
quantum process characterized by the numerator of Eq. (31), is the correlated, simultaneous ionization 
of two trigger atoms, at two spatial-temporal points {rj, ¢} and (12, t— 7}, by the same field mode, so 
that we need to make the following replacement in the first of Eqs. (25): 


& (r,t) > const x E(r, DE (r, t—T). (9.32) 


Repeating all the manipulations done above for the single-counter case, we get 


(T, (tr ,(¢ — r)) x ( ini 


Plugging this expression, as well as Eq. (29) for single-counter rates, into Eq. (31), we see that the field 
distribution factors (as well as the detector-specific brackets and the density of states p,) cancel, giving a 
very simple final expression: 


A(t)’ a(t—r)' a(t—r)4() ini) e" (re (r, Je(r,Je(r,). (9.33) 


(@' (aT e-r)ae-7)4) 


e(r)= (atoaeo) 


(9.34) 
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where the averaging should be carried out, as before, over the initial state of the field. 


Still, the calculation of this expression for arbitrary ts may be quite complex, because in many 
cases the relaxation of the correlation function to the asymptotic value g(c0) is due to the interaction of 
the light source with the environment, and hence requires the open-system techniques that were 
discussed in Chapter 7. However, the zero-delay value g”(0) may be calculated straightforwardly, 
because the time arguments of all operators are equal, so that we may write 


(9.35) 


Let us evaluate this ratio for the simplest states of the field. 


(i) The n" Fock state. In this case, it is convenient to act with the annihilation operators upon the 
ket-vectors, and by the creation operators, upon the bra-vectors, using Eqs. (19): 
bg Pp Pp g tq 


eg (0) = (n la" a" aan) = (n—2[n(n-D] [nm —-D]}*|n-2) _ n(n-1) = 
TA ny (n=1n'?n'|n=1) ae 


(n|a 


We see that the correlation function at small delays is suppressed rather than enhanced — see the dashed 
lines in Fig. 3. This photon antibunching effect has a very simple handwaving explanation: a single 
photon emitted by the wave source may be absorbed by just one of the detectors. For the initial state n = 
1, this is the only option, and it is very natural that Eq. (36) predicts no simultaneous counts at z= 0. 
Despite this theoretical simplicity, reliable observations of the antibunching have not been carried out 
until 1977,!9 due to the experimental difficulty of driving electromagnetic field oscillators into their 
Fock states — see Sec. 4 below. 


(11) The Glauber state _@. A similar procedure, but now using Eq. (5.124) and its Hermitian 


conjugate, (a|a" = (ala’, yields 


g (0) = : , (9.37) 


for any parameter a. We see that the result is different from that for the Fock states, unless in the latter 
case n — 0. (We know that the Fock and Glauber properties should also coincide for the ground state, 
but at that state the correlation function’s value is uncertain, because there are no photon counts at all.) 


(111) Classical mixture. From Chapter 7, we know that such statistical ensembles cannot be 
described by single state vectors, and require the density matrix w for their description. Here, we may 
combine Eqs. (35) and (7.5) to write 


napa en 
g(0) = Tr\wa'a'aa (9.38) 


19 By H. J. Kimble et al., Phys. Rev. Lett. 39, 691 (1977). For a detailed review of phonon antibunching, see, e.g., 
H. Paul, Rev. Mod. Phys. 54, 1061 (1982). 
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Spelling out this expression is easy for the field in thermal equilibrium at some temperature 7, 
because its density matrix is diagonal in the basis of Fock states n — see Eqs. (7.24): 


E = ho 
,=W.6,,, W, =exp,-— Z=2 nr, here A= ———}. 9.39 
Wan n nn | 2 1p Ww ere es k | ( ) 


n=0 B 


So, for the operators in the numerator and denominator of Eq. (38) we also need just the diagonal terms 
of the operator products, which have already been calculated — see Eq. (36). As a result, we get 


Swann Yatn(n-)x 2 
g(0) == 10 a (9.40) 


(Sm) (Ee) 


n=0 


One of the three series involved in this expression is just the usual geometric progression, 
= 
XA = 


n 


a 941 
n=0 l=A ( ) 


and the remaining two series may be readily calculated by its differentiation over the parameter A: 


n=0 n=0 
ce} co d’ ioe) _ 5 d’ ] 2/2 


Mn(n-YNeVYA?nn-lH=7 7 = 
yy oo yy — Pale dv i-a (1-A) 


Sane dy ana ya a4 1 2 aA = 
dao @ii-A -2 oe 


and for the correlation function we get an extremely simple result independent of the parameter 4 and 
hence of temperature: 


pa va-ay|[va-a)] _, 


(9.43) Photon 


bunching 


g°(0)= 


[ava—ay? 


This is exactly the photon bunching effect first observed by Hanbury Brown and Twiss — see Fig. 
3. We see that in contrast to antibunching, this is an essentially classical (statistical) effect. Indeed, Eq. 
(43) allows a purely classical derivation. In the classical theory, the counting rate (of a single counter) is 
proportional to the wave intensity J, so that Eq. (31) with z= 0 is reduced to 


ie = 
g(0)= — , with la E*(tHjox EE. (9.44) 


For a sinusoidal field, the intensity is constant, and g(0) = |. (This is also evident from Eq. (37), 
because the classical state may be considered as a Glauber state with @ — 0.) On the other hand, if the 
intensity fluctuates (either in time, or from one experiment to another), the averages in Eq. (44) should 
be calculated as 


(*) =f on rtar, with [war =1, and k=1,2, (9.45) 
0 0 
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where w(J) is the probability density. For classical statistics, the probability is an exponential function of 
the electromagnetic field energy, and hence its intensity: 


w)=Ce?!, where Bc l/kT, (9.46) 
so that Eqs. (45) yield:?° 


feae ne C/fP=1, andhence C= f£, 


_ J IV/B, fork =1, 
~ 27 p?, for k =2. 


. (9.47) 
= jor dI = CJexpl- BI dl =F [expt 


Plugging these results into Eq. (44), we get g(0) = 0, in complete agreement with Eq. (43). 


For some field states, including the squeezed ground states ¢ discussed at the end of Sec. 5.5, 
values g(0) may be even higher than 2 — the so-called super-bunching. Analyses of two cases of such 
super-bunching are offered for the reader’s exercise — see the problem list in the chapter’s end. 


9.3. Photon emission: spontaneous and stimulated 


In our simple model of photon counting, considered in the last section, the trigger atom in the 
counter absorbed a photon. Now let us have a look at the opposite process of spontaneous emission of 
photons by an atom in an excited state, still using the same electric-dipole approximation (24) for the 
atom-to-field interaction. For this, we may still use the Golden Rule for the model depicted in Fig. 6.12, 
but now the roles have changed: we have to associate the operator A with the electric dipole moment of 


the atom, while the operator B, with the electric field, so that the continuous spectrum of the system b 
represents the plurality of the electromagnetic field modes into which the spontaneous radiation may 
happen. Since now the transition increases the energy of the electromagnetic field, and decreases that of 
the atom, after the multiplication of the field bracket in Eq. (27a) by exp{-iaft}, and the second, by 
exp {+ iat}, we may keep only the photon creation operator whose time evolution (26) compensates this 
additional fast “rotation”. As a result, the Golden Rule takes the following form: 


Pr, = o( fin|a*] 0 } [in| -e(r)| ini} p, (9.48) 


where all operators and states are time-independent (i.e. taken in the Schrédinger picture), and pr is the 
density of final states of the electromagnetic field — which in this problem plays the role of the atom’s 
environment.2! Here the electromagnetic field oscillator has been assumed to be initially in the ground 
state — the assumption that will be changed later in this section. 


This relation, together with Eq. (19), shows that for the field’s matrix element be different from 
zero, the final state of the field has to be the first excited Fock state, n = 1. (By the way, this is exactly 


20 See, e.g., MA Eq. (6.7c) with n= 0 andn= 1. 

21 Here the sum over all electromagnetic field modes j may be smuggled back. Since in the quasi-static 
approximation ja << 1, which is necessary for the interaction representation by Eq. (24), the matrix elements in 
Eq. (48) are virtually independent on the direction of the wave vectors, and their magnitudes are fixed by a, the 
summation is reduced to the calculation of the total for all modes, and the averaging of e’(r) — see below. 
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the most practicable way of generating an excited Fock state of a field oscillator.) With that, Eq. (48) 
yields 


A 2 A 2 
P, = 20 |( fin|d -e(r)| ini)] p, = 200 |( fin|de, (r)| ini) p, (9.49) 
where the density p; of the excited electromagnetic field states should be calculated at the energy FE = 


ho, and eg is the Cartesian component of the vector e(r) along the electric dipole’s direction. The 
expression for the density or was our first formula in this course — see Eq. (1.1).22 From it, we get 


Pr = dE a re 1 (9.50) 


where the bounding volume V should be large enough to ensure spectrum’s virtual continuity: V >> 2° = 
(2c/)*. Because of that, in the normalization condition used to simplify Eq. (9), we may consider e”(r) 
constant. Let us represent this square as a sum of squares of the three Cartesian components of the 
vector e(1r): one of those (ez) aligned with the dipole’s direction; due to the space isotropy we may write 


e° =e, +e), +e,, = 3e. (951) 


As a result, the normalization condition yields 


; 1 
e,= ‘ 0:52 
: 36,7 0.92) 
and Eq. (49) gives the famous (and very important) formula?3 
3 n 2 7 ‘ * 
Be ini *©” (fin|dl ini)-( ini|d| fin)" (9.53) 


Ane, 3he? 4né, 3he> 


Leaving a comparison of this formula with the classical theory of radiation,?4 and the exact 
evaluation of I, for a particular transition in the hydrogen atom, for reader’s exercises, let me just 
estimate its order of magnitude. Assuming that d ~ erg = eh’ /me/42&) and ha~ Ey= me /4ne) ln’, 
and taking into account the definition (6.62) of the fine structure constant a ~ 1/137, we get 


2 3 
r-| = => ~3x10”. (9.54) 
o \ 4ze,hic 


This estimate shows that the emission lines at atomic transitions are typically very sharp. With the 
present-day availability of high-speed electronics, it also makes sense to evaluate the time scale z= 1/T 
of the typical quantum transition: for a typical optical frequency @~ 3x10" s”, it is close to 1 ns. This is 


22 If the same atom is placed into a high-Q resonant cavity (see, e.g., EM 7.9), the rate of its photon emission is 
strongly suppressed at frequencies between the cavity resonances (where pr > 0) — see, e.g., the review by S. 
Haroche and D. Klepner, Phys. Today 42, 24 (Jan. 1989). On the other hand, the emission is strongly (by a factor 
~ (A/V)O, where V is cavity’s volume) enhanced at resonance frequencies — the so-called Purcell effect, 
discovered by E. Purcell in the 1940s. For a brief discussion of this and other quantum electrodynamic effects in 
cavities, see the next section. 

23 This was the breakthrough result obtained by P. Dirac in 1927, which jumpstarted the whole field of quantum 
electrodynamics. An equivalent expression was obtained from more formal arguments in 1930 by V. Weisskopf 
and E. Wigner, so that sometimes Eq. (53) is (very unfairly) called the “Weisskopf-Wigner formula”. 

24 See, e.g., EM Sec. 8.2, in particular Eq. (8.29). 
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exactly the time constant that determines the time-delay dependence of the photon counting statistics of 
the spontaneously emitted radiation — see Fig. 3. Colloquially, this is the temporal scale of the photon 
emitted by an atom.?5 


Note, however, that the above estimate of zis only valid for a transition with a non-zero electric- 
dipole matrix element. If it equals zero, i.e. the transition does not satisfy the selection rules,*® — say, 
due to the initial and final state symmetry — it is “forbidden”. The “forbidden” transition may still take 
place due to a different, smaller interaction (say, via a magnetic dipole field of the atom, or its 
quadrupole electric field*’), but takes much longer. In some cases the increase of 7 is rather dramatic — 
sometimes to hours! Such long-lasting radiation is called the /uminescence — or the fluorescence if the 
initial atom’s excitation was due to external radiation of a higher frequency, followed first by non- 
radiative transitions down the ladder of energy levels. 


Now let us consider a more general case when the electromagnetic field mode of frequency @ is 
initially in an arbitrary Fock state n, and from it may either get energy fiw from the atomic system 
(photon emission) or, vice versa, give such energy back to the atom (photon absorption). For the photon 
emission rate, an evident generalization of Eq. (48) gives 


2 

at 
ror. [fmlatn) 
= = a (9.55) 
at 0) 


s 0>1 \ 1 a 
where both brackets should be calculated in the Schrédinger picture, and I’, is the spontaneous emission 
rate (48) of the same atomic system. According to the second of Eqs. (19), at the photon emission, the 
final field state has to be the Fock state with n’=n + 1, and Eq. (55) yields 


Thus the initial field increases the photon emission rate; this effect is called the stimulated emission of 
radiation. Note that the spontaneous emission may be considered as a particular case of the stimulated 
emission for n = 0, and hence interpreted as the emission stimulated by the ground state of the 
electromagnetic field — one more manifestation of the non-trivial nature of this “vacuum” state. 


On the other hand, following the arguments of Sec. 2,78 for the description of radiation 
absorption, the photon creation operator has to be replaced with the annihilation operator, giving the 
rate ratio 


25 The scale cz of the spatial extension of the corresponding wave packet is surprisingly macroscopic — in the 
range of a few millimeters. Such a “human” size of spontaneously emitted photons makes the usual optical table, 
with its 1-cm-scale components, the key equipment for many optical experiments — see, e.g., Fig. 2. 

26 As was already discussed in Sec. 5.6, for a single spin-less particle moving in a spherically-symmetric potential 
(e.g., a hydrogen-like atom), the orbital selection rules are simple: the only allowed electric-dipole transitions are 
those with Al = /gn- Jing = +1 and Am = mgn- Mini = 0 or +1. The simplest example of the transition that does not 
satisfy this rule, i.e. is “forbidden”, is that between the s-states (/ = 0) with n = 2 and n = 1; because of that, the 
lifetime of the lowest excited s-state of a hydrogen atom is as long as ~0.15 s. 

27 See, e.g., EM Sec. 8.9. 

28 Note, however, a major difference between the rate I discussed in Sec. 2, and I’, in Eq. (57). In our current 
case, the atomic transition is still between two discrete energy levels (see Fig. 4 below), so that the rate I, is 
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ise _ |(fin| @| nf 
= 2 
© {lat]o) 


a 
According to this relation and the first of Eqs. (19), the final state of the field at the photon absorption 
has to be the Fock state with n’ =n — 1, and Eq. (57) yields 


The results (56) and (58) are usually formulated in terms of relations between the Einstein 
coefficients A and B defined in the way shown in Fig. 4, where the two energy levels are those of the 
atom, I’, is the rate of energy absorption from the electromagnetic field in its n'" Fock state, and I’, is that 
of energy emission into the field, initially in the same state. In this notation, Eqs. (56) and (58) yield?? 


(9.57) 


A,, = B,, =B, (9.59) 


because each of these coefficients equals the spontaneous emission rate I. 


Fig. 9.4. The Einstein coefficients 
on the atomic quantum transition 
diagram — cf. Fig. 7.6. 


5 

_ 

8 
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I cannot resist the temptation to use this point for a small detour — an alternative derivation of the 
Bose-Einstein statistics for photons. Indeed, in the thermodynamic equilibrium, the average probability 
flows between levels | and 2 (see Fig. 4 again) should be equal:*° 


WT.) =WAT,), (9.60) 


where W, and W2 are the probabilities for the atomic system to occupy the corresponding levels, so that 
Eqs. (56) and (58) yield 
«2 _() 
WT (14+n)=WI(n), ie. = : 9.61 
2 4 ) 1 .( ) W, (n) fe 1 ( ) 


where (7) is the average number of photons in the field causing the interstate transitions. But, on the 
other hand, for an atomic subsystem only weakly coupled to its electromagnetic environment, we ought 
to have the Gibbs distribution of these probabilities: 


Wy SOP eo (9.62) 
W, exp{-E, /k,T} r 


proportional to p;, the density of final states of the electromagnetic field, i.e. the same density as in Eq. (48) and 
beyond, while the rate (27) is proportional to p,, the density of final (ionized) states of the “trigger” atom — more 
exactly, of it’s the electron released at its ionization. 

2° These relations were conjectured, from very general arguments, by Albert Einstein as early as 1916. 

30 This is just a particular embodiment of the detailed balance equation (7.198). 
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Requiring Eqs. (61) and (62) to give the same result for the probability ratio, we get the Bose-Einstein 
distribution for the electromagnetic field in thermal equilibrium: 


in) = | (9.63) 
exp{na/k,T}—-1 


- the same result as that obtained in Sec. 7.1 by other means — see Eq. (7.26b). 


Now returning to the discussion of Eqs. (56) and (58), their very important implication is the 
possibility to achieve the stimulated emission of coherent radiation using the level occupancy inversion. 
Indeed, if the ratio W>/ W;, is larger than that given by Eq. (62), the net power flow from the atomic 
system into the electromagnetic field, 


power =haxT, lw, ((n) + 1)- W, (n\|, (9.64) 


may be positive. The necessary inversion may be produced using several ways, notably by intensive 
quantum transitions to level 2 from an even higher energy level (which, in turn, is populated, e.g., by 
absorption of external radiation, usually called pumping, at a higher frequency.) 


A less obvious, but crucial feature of the stimulated emission is spelled out by Eq. (55): as was 
mentioned above, it shows that the final state of the field after the absorption of energy ii@ from the 
atom is a pure (coherent) Fock state (n + 1). Colloquially, one may say that the new, (n + 1)* photon 
emitted from the atom is automatically in phase with the 1 photons that had been in the field mode 
initially, i.e. joins them coherently.2! The idea of stimulated emission of coherent radiation using 
population inversion? was first implemented in the early 1950s in the microwave range (masers) and in 
1960 in the optical range (/asers). Nowadays, lasers are ubiquitous components of almost all high-tech 
systems and constitute one of the cornerstones of our technological civilization. 


A quantitative discussion of laser operation is well beyond the framework of this course, and I 
have to refer the reader to special literature,*3 but still would like to briefly mention two key points: 


(i) In a typical laser, each generated electromagnetic field mode is in its Glauber (rather than the 
Fock) state, so that Eqs. (56) and (58) are applicable only for the n averaged over the Fock-state 
decomposition of the Glauber state — see Eq. (5.134). 


(ii) Since in a typical laser (n) >> 1, its operation may be well described using quasiclassical 
theories that use Eq. (64) to describe the electromagnetic energy balance (with the addition of a term 
describing the energy loss due to field absorption in external components of the laser, including the 
useful load), plus the equation describing the balance of occupancies W, 2 due to all interlevel transitions 
— similar to Eq. (60), but including also the contribution(s) from the particular population inversion 
mechanism used in the laser. At this approach, the role of quantum mechanics in laser science is 
essentially reduced to the calculation of the parameter I’, for the particular system. 


This role becomes more prominent when one needs to describe fluctuations of the laser field. 
Here two approaches are possible, following the two options discussed in Chapter 7. If the fluctuations 


3! It is straightforward to show that this fact is also true if the field is initially in the Glauber state — which is more 
typical for modes in practical lasers. 

32 This idea may be traced back at least to an obscure 1939 publication by V. Fabrikant. 

33 [ can recommend, for example, P. Milloni and J. Eberly, Laser Physics, 24 ed., Wiley, 2010, and a less 
technical text by A. Yariv, Quantum Electronics, 3rd ed., Wiley, 1989. 
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are relatively small, one can linearize the Heisenberg equations of motion of the field oscillator 
operators near their stationary-lasing “values”, with the Langevin “forces” (also time-dependent 
operators) describing the fluctuation sources, and use these Heisenberg-Langevin equations to calculate 
the radiation fluctuations, just as was described in Sec. 7.5. On the other hand, near the lasing threshold, 
the field fluctuations are relatively large, smearing the phase transition between the no-lasing and lasing 
states. Here the linearization is not an option, but one can use the density-matrix approach described in 
Sec. 7.6, for the fluctuation analysis.34 Note that while the laser fluctuations may look like a peripheral 
issue, pioneering research in that field has led to the development of the general theory of open quantum 
systems, which was discussed in Chapter 7. 


9.4. Cavity QED 


Now I have to visit, at least in passing, the field of cavity quantum electrodynamics (usually 
called cavity QED for short) — the art and science of creating and using the entanglement between 
quantum states of an atomic system (either an atom, or an ion, or a molecule, etc.) and the 
electromagnetic field in a macroscopic volume called the resonant cavity (or just “resonator”, or just 
“cavity”). This field is very popular nowadays, especially in the context of the quantum computation 
and communication research discussed in Sec. 8.5.°> 


The discussion in the previous section was based on the implicit assumption that the energy 
spectrum of the electromagnetic field interacting with an atomic subsystem is essentially continuous, so 
that its final state is spread among many field modes, effectively losing its coherence with the quantum 
state of the atomic subsystem. This assumption has justified using the quantum-mechanical Golden Rule 
for the calculation of the spontaneous and stimulated transition rates. However, the assumption becomes 
invalid if the electromagnetic field is contained inside a relatively small volume, with its linear size 
comparable with the radiation wavelength. If the walls of such a cavity mostly reflect, rather than 
absorb, radiation, then the 0" approximation the energy dissipation may be disregarded, and the 
particular solutions e{r) of the Helmholtz equation (5) correspond to discrete, well-separated mode 
wave numbers k; and hence well-separated frequencies @.°° Due to the energy conservation, an atomic 
transition corresponding to energy AE = | Fini — Ein | may be effective only if the corresponding quantum 
transition frequency Q = AE/h is close to one of these resonance frequencies.’ As a result of such 
resonant interaction, the quantum states of the atomic system and the resonant electromagnetic mode 
may become entangled. 


A very popular approximation for the quantitative description of this effect is the so-called Rabi 
model,?® in which the atom is treated as a two-level system interacting with a single electromagnetic 
field mode of the resonant cavity. (As was shown in Sec. 6.5, this model is justified, e.g., if transitions 


34 This path has been developed (also in the mid-1960s), by several researchers, notably including M. Sully and 
W. Lamb - see, e.g., M. Sargent III, M. Scully, and W. Lamb, Jr., Laser Physics, Westview, 1977. 

35 This popularity was demonstrated, for example, by the award of the 2012 Nobel Prize in Physics to cavity QED 
experimentalists S. Haroche and D. Wineland. 

36 The calculation of such modes and corresponding frequencies for several simple cavity geometries was the 
subject of EM Sec. 7.8 of this series. 

37 On the contrary, if Q is far from any @, the interaction is suppressed; in particular, the spontaneous emission 
rate may be much lower than that given by Eq. (53) — so that this result is not as fundamental as it may look. 

38 After the pioneering work by I. Rabi in 1936-37. 
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between all other energy level pairs have considerably different frequencies.) As the reader knows well 
from Chapters 4-6 (see in particular Sec. 5.1), any two-level system may be described, just as a spin-'4, 


by the Hamiltonian bi +¢-6. Since we may always select the energy origin that b = 0, and the state 
basis in which e = cn-, the Hamiltonian of the atomic subsystem may be taken in the diagonal form 


H, =co, =— (9.65) 


where 7Q = 2c = AE is the difference between the energy levels in the absence of interaction with the 
field. Next, according to Eq. (17), ignoring the constant ground-state energy fa@/2 (which may be always 
added to the energy at the end — if necessary), the contribution of a single field mode of frequency @ to 
the total Hamiltonian of the system is 


A 


H, =hoa'a. (9.66) 
Finally, according to Eq. (16a), the electric field of the mode may be represented as 
7 l h 1/2 . : 
é (r,t) = (%2 c(n a — a) ; (9.67) 
i 
so that in the electric-dipole approximation (24), the cavity-atom interaction may be represented as a 
product of the field by some (say, y-) Cartesian component?? of the Pauli spin-/ operator: 
. . . ho’ 1 af a he 
H,, = const x o,, x € = const x o., x iF a—a |=ihko,|a-a |, (9.68) 
; ) 


where «is a coupling constant (with the dimension of frequency). The sum of these three terms, 


Se. hod" a+ inns, { aa" J. (9.69) 


giving a very reasonable description of the system, is called the Rabi Hamiltonian. Despite its apparent 
simplicity, using this Hamiltonian for calculations is not that straightforward.4° Only in the case when 
the electromagnetic field is large and hence may be treated classically, the results following from Eq. 
(69) are reduced to Eqs. (6.94) describing, in particular, the Rabi oscillations discussed in Sec. 6.3. 


The situation becomes simpler in the most important case when the frequencies Q and @ are very 
close, enabling an effective interaction between the cavity field and the atom even if the coupling 
constant « is relatively small. Indeed, if both the « and the so-called detuning (defined similarly to the 
parameter A used in Sec. 6.5), 

E€=QO-a, (9.70) 


39 The exact component is not important for final results, while intermediate formulas simplify if the interaction is 
proportional to either pure G, or pure G,. 
40 For example, an exact quasi-analytical expression for its eigenenergies (as zeros of a Taylor series in the 


parameter «, with coefficients determined by a recurrence relation) was found only recently — see D. Braak, Phys. 
Rev. Lett. 107, 100401 (2011). 
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are much smaller than Q ~ @, the Rabi Hamiltonian may be simplified using the rotating-wave 
approximation, already used several times in this course. For this, it is convenient to use the spin ladder 
operators, defined absolutely similarly for those of the orbital angular momentum — see Eqs. (5.153): 


G,=6,+i6,,  sothat é, = St (9.71) 
° i 
From Eq. (4.105), it is very easy to find the matrices of these operators in the standard z-basis, 
0 2 0 0 
oO, = , O_= : (9.72) 
0 0 2-0 
and their commutation rules — which turn out to be naturally similar to Eqs. (5.154): 
[6,,¢.]=46., [6.,6,]=226,. (9.73) 
In this notation, the Rabi Hamiltonian becomes 
A Q. aes zs 7 ae 
= "6, +naita+X(6,-6.)(4-a"), (9.74) 


and it is straightforward to use Eq. (4.199) and (73) to derive the Heisenberg-picture equations of 
motion for the involved operators. (Doing this, we have to remember that operators of the “spin” 
subsystem, on one hand, and of the field mode, on the other hand, are defined in different Hilbert spaces 
and hence commute — at least at coinciding time moments.) The result (so far, exact!) is 


a =-iod +-(6, -é.) a =iaa! +6, -6_), 
(9.75) 
6, =+i06, + dir a 2 ate. 6. = in{ a! ale, +6). 
At negligible coupling, «— 0, these equations have simple solutions, 
a(t) et, al(theel™, (tcc, 6.(t)~ const, (9.76) 


and the small terms proportional to « on the right-hand sides of Eqs. (75) cannot affect these time 
evolution laws dramatically even if « is not exactly zero. Of those terms, ones with frequencies close to 
the “basic” frequency of each variable would act in resonance and hence may have a substantial impact 
on the system’s dynamics, while non-resonant terms may be ignored. In this rotating-wave 
approximation, Eqs. (75) are reduced to a much simpler system of equations: 

At if 


m _. IK x ue Sie ik , 
a=-l@a-—o_, a’ =1@a' +—oO,, 
2 2 


(9.77) 


6, =i06,+2ini'é., 6 =-i06_-2iKié., = 6 = ix{ ate -46, | 


Alternatively, these equations of motion may be obtained exactly from the Rabi Hamiltonian 


+ 


(74), if it is preliminary cleared of the terms proportional to G,a' and o_d, that oscillate fast and hence 


self-average to produce virtually zero effect: 


Chapter 9 Page 20 of 36 


Jaynes- 
Cummings 
Hamiltonian 


Essential Graduate Physics QM: Quantum Mechanics 


= 6_+naita+ i at «,|6] << 0,Q.. (9.78) 


This is the famous Jaynes-Cummings Hamiltonian,*! which is basic model used in the cavity QED and 
its applications.*? To find its eigenstates and eigenenergies, let us note that at negligible interaction (x 
— 0), the spectrum of the total energy E of the system, which in this limit is the sum of two independent 
contributions from the atomic and cavity-field subsystems, 


Rey ee: rau 


Egat ; tS with n= 12,0, (9.79) 
consists*3 of close level pairs (Fig. 5) centered to values 
E = hol m2) (9.80) 


(At the exact resonance @ = Q, i.e. at € = 0, each pair merges into one double-degenerate level £,,.) 
Since at «— 0 the two subsystems do not interact, the eigenstates corresponding to the sublevels of the 
n'" pair may be represented by direct products of their independent state vectors: 


|+)=|t)@|n-1) and |-)=|4)@|n), (9.81) 


where the first ket of each product represents the state of the two-level (spin-/2-like) atomic subsystem, 
and the second ket, that of the field oscillator. 


+hOQ/2+ho=E, +hé 

_ eee eee eriee es E, =3ha/2 
SS Q/2+2ho = E, -hé 
et hQ/2 = E, +he 

ho SSS ee E, =ho/2 

en ee Uy eee ee eee —hOQ/2+ho= E, -hé 
SE =-AQ/2 
atom field total system ° 


Fig. 9.5. The energy spectrum (79) of the Jaynes-Cummings Hamiltonian in the limit «<< |é|. 
Note again that the energy is referred to the ground-state energy fa/2 of the cavity field. 


As we know from Chapter 6, even weak interaction may lead to strong coherent mixing*4 of 
quantum states with close energies (in this case, the two states (81) within each pair with the same n), 


41 Tt was first proposed and analyzed in 1963 by two engineers, Edwin Jaynes and Fred Cummings, in a Proc. 
IEEE publication, and it took the physics community a while to recognize and acknowledge the fundamental 
importance of that work. 

42 For most applications, the baseline Hamiltonian (78) has to be augmented by additional term(s) describing, for 
example, the incoming radiation and/or the system’s coupling to the environment, for example, due to the 
electromagnetic energy loss in a finite-Q-factor cavity — see Eq. (7.68). 

43 Only the ground state level E, = —AQ/2 is non-degenerate — see Fig. 5. 
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while their mixing with the states with farther energies is still negligible. Hence, at 0 < «, |€]<< a= Q,a 
good approximation of the eigenstate with E ~ E,, is given by a linear superposition of the states (81): 


|a,)=c,|+)+e_|-)=e, 1) @|n—-1) +c |¥)®|n), (9.82) 


with certain c-number coefficients cz. This relation describes the entanglement of the atomic eigenstates 
T and \ with the Fock states number n and n — 1 of the field mode. Let me leave the (straightforward) 
calculation of the coefficients (c:)* for each of two entangled states (for each n) for the reader’s exercise. 
(The result for the corresponding two eigenenergies (E,,)s may be again represented by the same 
anticrossing diagram as shown in Figs. 2.29 and 5.1, now with the detuning & as the argument.) This 
calculation shows, in particular, that at €= 0 (i.e. at @=Q), |c:| = |c| = 1/V2 for both states of the pair. 
This fact may be interpreted as a (coherent!) equal sharing of an energy quantum ii@ = hQ by the atom 
and the cavity field at the exact resonance. 


As a (hopefully, self-evident) by-product of the calculation of c: is the fact that the dynamics of 
the state a, described by Eq. (82), is similar to that of the generic two-level system that was repeatedly 
discussed in this course — the first time in Sec. 2.6 and then in Chapters 4-6. In particular, if the 
composite system had been initially prepared to be in one component state, for example |t)@0) (i.e. 
with the atom excited, while the cavity in its ground state), and then allowed to evolve on its own, after 
some time interval Ar ~ 1/« it may be found definitely in the counterpart state -)®]1), including the first 
excited Fock state n = | of the field mode. If the process is allowed to continue, after the equal time 
interval Ar, the system returns to the initial state |1)@J0), etc. This most striking prediction of the Jaynes- 
Cummings model was directly observed, by G. Rempe et al., only in 1987, although less directly this 
model was repeatedly confirmed by numerous experiments carried out in the 1960s and 1970s. 


This quantized version of the Rabi oscillations can only persist in time if the inevitable 
electromagnetic energy losses (not described by the basic Jaynes-Cummings Hamiltonian) are somehow 
compensated — for example, by passing a beam of particles, externally excited into the higher-energy 
state T, though the cavity. If the losses become higher, the dissipation suppresses quantum coherence, in 
our case the coherence between two components of each pair (82), as was discussed in Chapter 7. As a 
result, the transition from the higher-energy atomic state T to the lower-energy state J, giving energy ha 
to the cavity (n — 1 > n), which is then rapidly drained into the environment, becomes incoherent, so 
that the system’s dynamics is reduced to the Purcell effect, already mentioned in Sec. 3. A quantitative 
analysis of this effect is left for the reader’s exercise. 


The number of interesting physics games one can play with such systems — say by adding 
external sources of radiation at a frequency close to @ and Q, in particular with manipulated time- 
dependent amplitude and/or phase, is always unlimited.*5 Unfortunately, my time/space allowance for 
the cavity QED is over, and for further discussion, I have to refer the interested reader to special 
literature.*° 


44 Tn some fields, especially chemistry, such mixing is frequently called hybridization. 

45 Most of them may be described by adding new terms to the basic Jaynes-Cummings Hamiltonian (78). 

46 T can recommend, for example, either C. Gerry and P. Knight, Introductory Quantum Optics, Cambridge U. 
Press, 2005, or G. Agarwal, Quantum Optics, Cambridge U. Press, 2012. 
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9.5. The Klein-Gordon and relativistic Schrodinger equations 


Now let me switch gears and discuss the basics of relativistic quantum mechanics of particles 
with a non-zero rest mass m. In the ultra-relativistic limit pc >> mc’ the quantization scheme of such 
particles may be essentially the same as for electromagnetic waves, but for the intermediate energy 
range, pc ~ mc’, a more general approach is necessary. Historically, the first attempts?” to extend the 
non-relativistic wave mechanics into the relativistic energy range were based on performing the same 
transitions from classical observables to their quantum-mechanical operators as in the non-relativistic 
limit: 

p > p=-ihV, E> Haine. (9.83) 
The substitution of these operators, acting on the Schrédinger-picture wavefunction ‘Y(r,f), into the 
classical relation (1) between the energy E and momentum p (for of a free particle) leads to the 
following formulas: 


Table 9.1. Deriving the Klein-Gordon equation for a free relativistic particle. #8 


Non-relativistic limit Relativistic case 
F 1 2 
Classical E=—p’ FE =e p° +(mc?) 
mechanics 2m 


Wave | © Le. sete rae ae =, 
h—¥ =—(-ihv) ? A—|P= —th Ww Ww 
mechanics Ot =a ) ( <) ¢ ( u Vv) +(mc* ) 


The resulting equation for the non-relativistic limit, in the left-bottom cell of the table, is just the 
usual Schrédinger equation (1.28) for a free particle. Its relativistic generalization, in the right-bottom 
cell, usually rewritten as 


(9.84) 


is called the Klein-Gordon (or sometimes “Klein-Gordon-Fock”) equation. The fundamental solutions 
of this equation are the same plane, monochromatic waves 


Wir, t) x exp{i(k r- at). (9.85) 


as in the non-relativistic case. Indeed, such waves are eigenstates of the operators (83), with 
eigenvalues, respectively, 
p =k, and E =ha, (9.86) 


so that their substitution into Eq. (84) immediately returns us to Eq. (1) with the replacements (86): 


47 This approach was suggested in 1926-1927, i.e. virtually simultaneously, by (at least) V. Fock, E. Schrédinger, 
O. Klein and W. Gordon, J. Kudar, T. de Donder and F.-H. van der Dungen, and L. de Broglie. 

48 Note that in the left, non-relativistic column of this table, the energy is referred to the rest energy mc’, while in 
its right, relativistic column, it is referred to zero — see Eq. (1). 
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E. Ho, = tl(nck + (me?) |”. (9.87) 


Though one may say that this dispersion relation is just a simple combination of the classical 
relation (1) and the same basic quantum-mechanical relations (86) as in non-relativistic limit, it attracts 
our attention to the fact that the energy iw as a function of the momentum fk has two branches, with E_ 
(p) = —E.(p) — see Fig. 6a. Historically, this fact has played a very important role in spurring the 
fundamental idea of particle-antiparticle pairs. In this idea (very similar to the concept of electrons and 
holes in semiconductors, which was discussed in Sec. 2.8), what we call the “vacuum” actually 
corresponds to all quantum states of the lower branch, with energies E_(p) < 0, being completely filled, 
while the states on the upper branch, with energies E.(p) > 0, being empty. Then an externally supplied 
energy, 

AE=E,-E_=E,+(-E_)>2mc’ >0, (9.88) 


may bring the system from the lower branch to the upper one (Fig. 6b). The resulting excited state is 
interpreted as a combination of a particle (formally, of the infinite spatial extension) with the energy EF. 
and the momentum p, and a “hole” (antiparticle) of the positive energy (—E_) and the momentum —p. 
This idea*? has led to a search for, and discovery of the positron: the electron’s antiparticle with charge qg 
= +e, in 1932, and later of the antiproton and other antiparticles. 


Fig. 9.6. (a) The free-particle 
dispersion relation resulting from 
the Klein-Gordon and _ Dirac 
equations, and (b) the scheme of 
creation of a particle-antiparticle 
pair from the vacuum. 


Free particles of a finite spatial extension may be described, in this approach, just as in the non- 
relativistic Schrédinger equation, by wave packets, i.e. linear superpositions of the de Broglie waves 
(85) with close wave vectors k, and the corresponding values of @ given by Eq. (87), with the positive 
sign for the “usual” particles, and negative sign for antiparticles — see Fig. 6a above. Note that to form, 
from a particle’s wave packet, a similar wave packet for the antiparticle, with the same phase and group 
velocities (2.33a) in each direction, we need to change the sign not only before @, but also before k, i.e. 
to replace all component wavefunctions (85), and hence the full wavefunction, with their complex 
conjugates. 


Of more formal properties of Eq. (84), it is easy to prove that its solutions satisfy the same 
continuity equation (1.52), with the probability current density j still given by Eq. (1.47), but a different 
expression for the probability density w — which becomes very similar to that for j: 


eset : [v" ed c«:] iz : (evy" -cc.} (9.89) 


2mc ot = 2m 


49 Due to the same P. A. M. Dirac! 
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— very much in the spirit of the relativity theory, treating space and time on equal footing. (In the non- 
relativistic limit p/mc — 0, Eq. (84) allows the reduction of this expression for w to the non-relativistic 
Eq. (1.22): w > PY*.) 


The Klein-Gordon equation may be readily generalized to describe a particle moving in external 
fields; for example, the electromagnetic field effects on a particle with charge g may be described by the 
same replacement as in the non-relativistic limit (see Sec. 3.1): 


po>P-gA(r,t), H>A-qd(r,t), (9.90) 


where P =-ifV is the canonical momentum operator (3.25), and the vector- and scalar potentials, A 
and ¢, should be treated appropriately — either as c-number functions if the electromagnetic field 
quantization is not important for the particular problem, or as operators (see Secs. 1-4 above) if it is. 


However, the practical value of the resulting relativistic Schrodinger equation is rather limited, 
for two main reasons. First of all, it does not give the correct description of particles with spin. For 
example, for the hydrogen-like atom/ion problem, i.e. the motion of an electron with the electric charge 
—e, in the Coulomb central field of an immobile nucleus with charge +Ze, the equation may be readily 
solved exactly*® and yields the following spectrum of (doubly-degenerate) energy levels: 


Za? 


Via 


/2 


-1/2 
, with Aensl+%) -z?a?] (1+%), (9.91) 


B=me[1+ 


where n = 1, 2,... and /=0, 1,..., 7 — 1 are the same quantum numbers as in the non-relativistic theory 
(see Sec. 3.6), and a ~ 1/137 is the fine structure constant (6.62). The three leading terms of the Taylor 
expansion of this result in the small parameter Za are as follows: 


Piney) 4.4 
b= me) = A Z Alt (9.92) 


n 2n* \1l+% 4 
The first of these terms is just the rest energy of the particle. The second term, 


Saad... MeV... Bp 


E =-mc = = , with E, =Z7E,, 9.93 
n 2n? (4ze,) n° 2n? 2n? 0 H ( ) 
reproduces the non-relativistic Bohr’s formula (3.201). Finally, the third term, 
Zia on ey 
— me? - Z -3\=- | Lees ; (9.94) 
2n 1+% 4 me \l+% 4 


is just the perturbative kinetic-relativistic contribution (6.51) to the fine structure of the Bohr levels (93). 
However, as we already know from Sec. 6.3, for a spin-’2 particle such as the electron, the spin-orbit 
interaction (6.55) gives an additional contribution to the fine structure, of the same order, so that the net 
result, confirmed by experiment, is given by Eq. (6.60), i.e. is different from Eq. (94). This is very 
natural, because the relativistic Schrédinger equation does not have the very notion of spin. 


Second, even for massive spinless particles (such as the Z° bosons), for which this equation is 
believed to be valid, the most important problems are related to particle interactions at high energies of 


50 This task is left for the reader’s exercise. 
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the order of AE ~ 2mc’ and beyond — see Eq. (88). Due to the possibility of creation and annihilation of 
particle-antiparticle pairs at such energies, the number of particles participating in such interactions is 
typically considerable (and variable), and the adequate description of the system is given not by the 
relativistic Schrédinger equation (which is formulated in single-particle terms), but by the quantum field 
theory — to which I will devote only a few sentences in the very end of this chapter. 


9.6. Dirac’s theory 


The real breakthrough toward the quantum relativistic theory of electrons (and other spin-’4 
fermions) was achieved in 1928 by P. A. M. Dirac. For that time, the structure of his theory was highly 
nontrivial. Namely, while formally preserving, in the coordinate representation, the same Schrédinger- 
picture equation of quantum dynamics as in the non-relativistic quantum mechanics,>! 

Ov 
ih—= HY, (9.95) 
ot 
it postulates that the wavefunction ‘¥ it describes is not a scalar complex function of time and 
coordinates, but a four-component column-vector (sometimes called the bispinor) of such functions, its 
Hermitian-conjugate bispinor * being a 4-component row-vector of their complex conjugates: 


Y (r,t) 


_| ¥20,0) + (yt P . . 
Y= w(t) | wy = (¥* (r,2), Po(r.t), (1,0), v0), (9.96) 


¥,(0,0) 


and that the Hamiltonian participating in Eq. (95) is a 4x4 matrix defined in the Hilbert space of 
bispinors Y’. For a free particle, the postulated Hamiltonian looks amazingly simple: *2 


5! After the “naturally-relativistic” form of the Klein-Gordon equation (84), this apparent return to the non- 
relativistic Schrodinger equation may look very counter-intuitive. However, it becomes a bit less surprising taking 
into account the fact (whose proof is left for the reader’s exercise) that Eq. (84) may be also recast into the form 
(95) for a two-component column-vector (sometimes called spinor), with a Hamiltonian which may be 
represented by a 2x2 matrix — and hence expressed via the Pauli matrices (4.105) and the identity matrix I. 
52 Moreover, if the time derivative participating in Eq. (95), and the three coordinate derivatives participating (via 
the momentum operator) in Eq. (97), are merged into one 4-vector operator 0/Ox, = {V, 0/O(ct)}, the Dirac 
equation (95) may be rewritten in an even simpler, manifestly Lorentz-invariant 4-vector form (with the implied 
summation over the repeated index k = 1, ..., 4 — see, e.g., EM Sec. 9.4): 
ie i SAR Ses Fh 0 -i6\) . « 
fi, 2 +ufvao where i= Hriil=(Q } Yy, =B, 
Ox, is 0 

where ut = mc/h — just as in Eq. (84). Note also that, very counter-intuitively, the Dirac Hamiltonian (97) is linear 
in the momentum, while the non-relativistic Hamiltonian of a particle, as well as the relativistic Schrédinger 
equation, are quadratic in p. In my humble opinion, the Dirac theory (including the concept of antiparticles it has 
inspired) may compete for the title of the most revolutionary theoretical idea in physics of all times, despite such 
strong contenders as Newton’s laws, Maxwell’s equations, Gibbs’ statistical distribution, Bohr’s theory of the 
hydrogen atom, and Einstein’s general relativity. 
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where p =—ifV is the same 3D vector operator of momentum as in the non-relativistic case, while the 


operators @ and B may be represented in the following shorthand 2x2 form: 


(9.98a) 


The operator @, composed of the Pauli vector operators 6, is also a vector in the usual 3D 
space, with each of its 3 Cartesian components being a 4x4 matrix. The particular form of the 2x2 
matrices corresponding to the operators 6 and T in Eq. (98a) depends on the basis selected for the spin 
state representation; for example, in the standard z-basis, in which the Cartesian components of 6 are 
represented by the Pauli matrices (4.105), the 4x4 matrix form of Eq. (98a) is 


00 0 1 0 0 0 -i 0 0 1 0 10 0 0O 
0 0 1 0 0 0 ft O 0 0 0 -!l 0 1 
a, = J a=f ; Gi , B= .(9.98b) 
0 1 0 0 : 0 -i 0 O 1 0 0 0O 00 -1 
1 0 0 0 i 0 0 0 0 -1 0 0O 00 0 -!l 


It is straightforward to use Eqs. (98) to verify that the matrices a, a, a, and B satisfy the following 
relations: 


On De Or ee 
a, =a, =a, =p =], (9.99) 
0,@,+0,0, =A,0,+0,0, =a,0,+0,0, =0,B+Ba, =a,B+Ba, =a,B+Ba,=0, (9.100) 
i.e. anticommute. 


Using these commutation relations, and acting essentially as in Sec. 1.4, it is straightforward to 
show that any solution to the Dirac equation obeys the probability conservation law, i.e. the continuity 
equation (1.52), with the probability density: 


w= Pry, (9.101) 
and the probability current, 
j= ca, (9.102) 


looking a/most as in the non-relativistic wave mechanics — cf. Eqs. (1.22) and (1.47). Note, however, the 
Hermitian conjugation used in these formulas instead of the complex conjugation, to form the scalars w, 
Jx, Jy, and j, from the 4-component state vectors (96). 


This close similarity is extended to the fundamental, plane-wave solutions of the Dirac equations 
is free space. Indeed, plugging such solution, in the form 


u, 


W = yellkr-or) _| “2 | ikr—ar) (9.103) 
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into Eqs. (95) and (97), we see that they are indeed satisfied, provided that a system of four coupled, 
linear algebraic equations for four complex c-number amplitudes 1234 is satisfied. The condition of its 
consistency yields the same dispersion relation (87), 1.e. the same two-branch diagram shown in Fig. 6, 
as follows from the Klein-Gordon equation. The difference is that plugging each value of @, given by 
Eq. (87), back into the system of the linear equations for four amplitudes u, we get two solutions for 
their vector u = (1, U2, U3, U4) for each of the two energy branches — see Fig. 6 again. In the standard z- 
basis of spin operators, they may be represented as follows: 


1 0 
0 1 
Pz cp_ 
for EF=E,>0: u..=c,, Btme |? UN =csl bo ame? |? (9.104a) 
CP, ~P, 
E, +mc* E, +mc* 
CP, Pe 
E_—mc* E_—mc’ 
CP, —P. 
for E=E_<0: u,=c_4 mae AeA (9.104b) 
| 0 
0 1 


where p+ = px + ip, and c+ are normalization coefficients. 


The simplest interpretation of these solutions is that Eq. (103), with the vectors u, given by Eq. 
(104a), represents a spin-’ particle (say, an electron), while with the vectors u_ given by Eq. (104b), it 
represents an antiparticle (a positron), and the two solutions for each particle, indexed with opposite 
arrows, correspond to two possible directions of the spin—’2 , o; = +1, 1.e. S, = +h/2. This interpretation 
is indeed solid in the non-relativistic limit, when two last components of the vector (104a), and two first 
components of the vector (104b) are negligibly small: 


1 0 0 0 
0 0 0 Deas 

a> of a of u,> if uj of for roee ON (9.105) 
0 0 0 1 


However, at arbitrary energies, the physical picture is more complex. To show this, let us use the 
Dirac equation to calculate the Heisenberg-picture law of time evolution of the operator of some 
Cartesian component of the orbital angular momentum L = rxp, for example of L,= yp, — zp,, taking 
into account that the Dirac operators (98a) commute with those of r and p, and also the Heisenberg 
commutation relations (2.14): 


in = [fF] = c@-[(96, - 2%, )]=-inela..p, -4,p.), (2106) 
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with similar relations for two other Cartesian components. Since the right-hand side of these equations is 
different from zero, the orbital momentum is generally not conserved — even for a free particle! Let us, 
however, consider the following vector operator, 


(9.107) 


010 0 0 -i 0 0 1 0 0 0 
1 0 0 0 0 0 0 0-1 0 0 
oe . 2 ee - (9.107b) 
* 210 00 1 2}0 0 O -i 2}0 0 1 O 
001 0 0 0 i O 0 0 0 -1 
Let us calculate the Heisenberg-picture law of time evolution of these components, for example 
OS. lw Al je de e oe kako 
ins = Is..A|- clé..(é, Pp, +@,p,+4.p, )). (9.108) 
A direct calculation of the commutators of the matrices (98) and (107) yields 
IS..é,J=0, [8,.4,|=ine., |8,.¢.]=-ina,. (9.109) 
so that we finally get 
ins = inc(é.p, -4,p.), (9.110) 


with similar expressions for the other two components of the operator. Comparing this result with Eq. 
(106), we see that any Cartesian component of the operator defined similarly to Eq. (5.170), 


J=L+S, (9.111) 


is an integral of motion,>3 so that this operator may be interpreted as the one representing the total 
angular momentum of the particle. Hence, the operator (107) may be interpreted as the spin operator of a 
spin-’ particle (e.g., electron). As it follows from the last of Eq. (107b), in the non-relativistic limit the 
columns (105) represent the eigenkets of the z-component of that operator, with eigenstates S, = +h/2, 
with the sign corresponding to on the arrow index. So, the Dirac theory provides a justification for spin- 
Y2 — or, somewhat more humbly, replaces the Pauli Hamiltonian postulate (4.163) with that of a simpler 
(and hence more plausible), Lorentz-invariant Hamiltonian (97). 


Note, however, that this simple interpretation, fully separating a particle from its antiparticle, is 
not valid for the exact solutions (103)-(104), so that generally the eigenstates of the Dirac Hamiltonian 
are certain linear (coherent) superpositions of the components describing the particle and its antiparticle 
— each with both directions of spin. This fact leads to several interesting effects, including the so-called 
Klien paradox at the reflection of a relativistic electron from a potential barrier.*4 


53 It is straightforward to show that this result remains valid for a particle in any central field U(r). 
54 See, e.g., A. Calogeracos and N. Dombey, Contemp. Phys. 40, 313 (1999). 
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9.7. Low-energy limit 


The generalization of Dirac’s theory to the case of a (spin-’2) particle with an electric charge q, 
moving in a classically-described electromagnetic field, may be obtained using the same replacement 
(90). As a result, Eq. (95) turns into 


lcé.-(—inV—gA)+me?B + (g¢- HY =0, (9.112) 


where the Hamiltonian operator H is understood in the sense of Eq. (95), i.e. as the partial time 
derivative with the multiplier if. Let us prepare this equation for a low-energy approximation by acting 
on its left-hand side by a similar square bracket but with the opposite sign before the last parentheses — 
also an operator! Using Eqs. (99) and (100), and the fact that the space- and time-independent 


operators @ and B commute with the spin-independent, c-number functions A(r,¢) and ¢(r,t), as well 


as with the Hamiltonian operator if0/ot, the result is 


{c? [a-(-inv —gA)P + (me?) -cla-(-inv -gA),(q¢-4)|-(ge- A) \y =0. (9.113) 
A direct calculation of the first square bracket, using Eqs. (98) and (107), yields 
[a-(-inV —gA)f =(-inV-—qA/) -29g8-VxA. (9.114) 


But the last vector product on the right-hand side is just the magnetic field — see, e.g., Eqs. (3.21): 
B=VxA. (i113) 


Similarly, we may use the first of Eqs. (3.21), for the electric field, 
OA 
é =-Vo-—, 9.116 
p 7 (9.116) 


to simplify the commutator participating in Eq. (9.113): 
la-(-inv - gA),(q¢- A)|=-ga-[4,A]-inga-[V,9]= ing - ina-V$=ihga-€. (9.117) 
As aresult, Eq. (113) becomes 
{c2(-inv ga)? +(q¢-A) (me ~29°S: B+ ihcqa-€ \y =0. (9.118) 


So far, this is an exact result, equivalent to Eq. (112), but it is more convenient for an analysis of 
the low-energy limit, in which not only the energy offset E — mc’ (which is just the energy used in the 
non-relativistic mechanics), but also the electrostatic energy of the particle, |q(¢)|, are much smaller than 
the rest energy mc’. In this limit, the second and third terms of Eq. (118) almost cancel, and introducing 
the offset Hamiltonian 


H=H-mce’l. (9.119) 
we may approximate their difference, up to the first non-zero term, as 
2 


(of - 47) (me?) i = (api -me?i it) —(me?) i = 2me'( H~9@i). (9.120) 


As a result, after the division of all terms by 2mc’, Eq. (118) may be approximated as 
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so -inw 9s) +49-18.34 Lae |v. (9.121) 
m 


2mc 


m 


Let us discuss this important result. The first two terms in the square brackets give the non- 
relativistic Hamiltonian (3.26), which was extensively used in Chapter 3 for the discussion of charged 
particle motion. Note again that the contribution of the vector potential A into that Hamiltonian is 
essentially relativistic, in the following sense: when used for the description of magnetic interaction of 
two charged particles, due to their orbital motion with speed v << c, the magnetic interaction is a factor 
of (v/c)” smaller than the electrostatic interaction of the particles.55 The reason why we did discuss the 
effects of A in Chapter 3 was that is was used there to describe external magnetic fields, keeping our 
analysis valid even for the cases when that field is strong because of being produced by relativistic 
effects — such as aligned spins of a permanent magnet. 


The next, third term in the square brackets of Eq. (121) should be also familiar to the reader: this 
is the Pauli Hamiltonian — see Eqs. (4.3), (4.5), and (4.163). When justifying this form of interaction in 
Chapter 4, I referred mostly to the results of Stern-Gerlach-type experiments, but it is extremely 
pleasing that this result5° follows from such a fundamental relativistic treatment as Dirac's theory. As we 
already know from the discussion of the Zeeman effect in Sec. 6.4, the magnetic field effects on the 
orbital motion of an electron (described by the orbital angular momentum L) and its spin S are of the 
same order, though quantitatively different. 


Finally, the last term in the square brackets of Eq. (121) is also not quite new for us: in 
particular, it describes the spin-orbit interaction. Indeed, in the case of a classical, spherical-symmetric 
electric field & corresponding to the potential ¢(r) = U(r)/q, this term may be reduced to Eq. (6.56): 


(9.122) 


The proof of this correspondence requires a bit of additional work.°’ Indeed, in Eq. (121), the term 
responsible for the spin-orbit interaction acts on 4-component wavefunctions, while the Hamiltonian 
(122) is supposed to act on non-relativistic state vectors with an account of spin, whose coordinate 
representation may be given by 2-component spinors:58 


55 This difference may be traced by classical means — see, e.g., EM Sec. 5.1. 

56 Note that in this result, the g-factor of the particle is still equal to exactly 2 — see Eq. (4.115) and its discussion 
in Sec. 4.4. In order to describe the small deviation of g, from 2, the electromagnetic field should be quantized 
(just as this was discussed in Secs. 1-4 of this chapter), and its potentials A and ¢, participating in Eq. (121), 
should be treated as operators — rather than as c-number functions as was assumed above. 

57 The only facts immediately evident from Eq. (121) are that the term we are discussing is proportional to the 
electric field, as required by Eq. (122), and that it is of the proper order of magnitude. Indeed, Eqs. (101)-(102) 
imply that in the Dirac theory, c@ plays the role of the velocity operator, so that the expectation values of the term 
are of the order of hgvé/2mc’. Since the expectation values of the operators participating in the Hamiltonian (122) 
scale as S ~ fi/2 and L ~ mvr, the spin-orbit interaction energy has the same order of magnitude. 

58 In this course, the notion of spinor (popular in some textbooks) was not used much; it was introduced earlier 
only for two-particle states — see Eq. (8.13). For a single particle, such definition is reduced to y(r)|s), whose 
representation in a particular spin-'4 basis is the column (123). Note that such spinors may be used as a basis for 
an expansion of the spin-orbitals yr) defined by Eq. (8.125), where the index / is used for numbering both the 
spin’s orientation (i.e. the particular component of the spinor's column) and the orbital eigenfunction. 
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y -(7"}. (9.123) 
Wy 


The simplest way to prove the equivalence of these two expressions is not to use Eq. (121) 
directly, but to return to the Dirac equation (112), for the particular case of motion in a static electric 
field but no magnetic field, when Dirac’s Hamiltonian is reduced to 


H=cé-p+Bmc?+U(r), — with U=q¢. (9.124) 


Since this Hamiltonian is time-independent, we may look for its 4-component eigenfunctions in the form 


(r,t)= a A ex - i71), (9.125) 


where each of y is a 2-component column of the type (123), representing two spin states of the particle 
(index +) and its antiparticle (index —). Plugging Eq. (125) into Eq. (95) with the Hamiltonian (124), 
and using Eq. (98a), we get the following system of two linear equations: 

[E-me? -U(r)ly, -cé-py_=0, [£+me? -U(r) lv. -cé-py, =0. (9.126) 


Expressing y. from the latter equation, and plugging the result into the former one, we get the following 
single equation for the particle’s spinor: 


fe me —U(r)-c’6-p (9.127) 


1 ae 
G: =0. 
E+mce?* —U(r) py. 
So far, this is an exact equation for eigenstates and eigenvalues of the Hamiltonian (124), but it 
may be substantially simplified in the low-energy limit when both the potential energy°? and the non- 
relativistic eigenenergy 


E=E-mce’ (9.128) 


are much lower than mc’. Indeed, in this case, the expression in the denominator of the last term in the 
brackets of Eq. (127) is close to 2mc’. Since o” = 1, with that replacement, Eq. (127) is reduced to the 
non-relativistic Schrédinger equation, similar for both spin components of y4, and hence giving spin- 
degenerate energy levels. To recover small relativistic and spin-orbit effects, we need a slightly more 
accurate approximation: 


1 1 1 pve Lf al 19158) 
Cc 


E+mc? -U(r) ? 2mc? +E-U(r) a 2mc* 2mc 2mc* 


in which Eq. (127) is reduced to 


a2 and 

#-u(¢)-fi46. p25.» v, =0. (9.130) 
2m (2mec? ) 

As Eqs. (5.34) shows, the operators of the momentum and of a function of coordinates commute as 


[p.U(r)|=-invu , (9.131) 


59 Strictly speaking, this requirement is imposed on the expectation values of U(r) in the eigenstates to be found. 
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so that the last term in the square brackets of Eq. (130) may be rewritten as 


x A E —U(r) 
(2mcy 


E-U(r).. ih 
(2mcy . (2mc) 


6-p= +(6-VU\6-p). (9.132) 


Since in the low-energy limit, both terms on the right-hand side of this relation are much smaller 
than the three leading terms of Eq. (130), we may replace the first term’s numerator with its non- 


relativistic approximation p* /2m. With this replacement, the term coincides with the first relativistic 
correction to the kinetic energy operator — see Eq. (6.47). The second term, proportional to the electric 
field €=—V¢=—VU/q, may be transformed further on, using a readily verifiable identity 


(6-VU\6-p)=(VU)-p+ié-[(VU)xp]. (9.133) 


Of the two terms on the right-hand side of this relation, only the second one depends on spin, giving 
the following spin-orbital interaction contribution to the Hamiltonian, 
i h . , r 
» == 6 [(VU)xp]=—4 8 [(V9)xp]. (9.134) 
(2mc) 
For a central potential 4(r), its gradient has only the radial component: V¢= (d@/dr)r/r = —ér/r, and with 
the angular momentum definition (5.147), Eq. (134) is (finally!) reduced to Eq. (122). 


As was shown in Sec. 6.3, the perturbative treatment of Eq. (122), together with the kinetic- 
relativistic correction (6.47), in the hydrogen-like atom/ion problem, leads to the fine structure of each 


Bohr level E,,, given by Eq. (6.60): 
2E 
AE --2(3- gil i (9.135) 


fi . 
ine mc j i VA 


This result receives a confirmation from the surprising fact that for the hydrogen-like atom/ion problem, 
the Dirac equation may be solved exactly — without any assumptions. I would not have time/space to 
reproduce the solution,®! and will only list the final result for the energy spectrum: 


Z? 2 
~~ =414+ = — ; (9.136) 
ie h+{G+w?—za2} -(j+¥)| 
Here n = 1, 2, ... is the same principal quantum number as in Bohr’s theory, while 7 is the quantum 
number specifying the eigenvalues (5.175) of J’, in our case of a spin-% particle taking half-integer 
values: 7 = / + '4 = 1/2, 3/2, 5/2, ... — see Eq. (5.189). This is natural, because due to the spin-orbit 


interaction, the orbital momentum and spin are not conserved, while their vector sum, J = L + S, is — at 
least in the absence of an external field. Each energy level (136) is doubly-degenerate, with two 
eigenstates representing two directions of the spin. (In the low-energy limit, we may say: corresponding 
to two values of / = 7 + '4, at fixed 7.) 


60 The first term gives a small spin-independent energy shift, which is very difficult to verify experimentally. 
6! Good descriptions of the solution are available in many textbooks (the older the better :-) — see, e.g., Sec. 53 in 
L. Schiff, Quantum Mechanics, 3" ed., McGraw-Hill (1968). 
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Speaking of that limit (when E — mc’? ~ Ey << mc’): since according to Eq. (1.13) for Ey, the 
square of the fine-structure constant a = e’/4mehc may be represented as the ratio Ey/mc’, we may 
follow this limit expanding Eq. (136) into the Taylor series in (Za)” << 1. The result, 


Zo? Zo 

Fe Pe cca aa ee A (9.137) 
2n>2n* (f+ 4 

has the same structure, and allows the same interpretation as Eq. (92), but with the last term coinciding 


with Eq. (6.60) — and with experimental results. Historically, this correct description of the fine structure 
of the atomic levels provided the decisive proof of Dirac’s theory. 


However, even such an impressive theory does not have too many direct applications. The main 
reason for that was already discussed in brief in the end of Sec. 5: due to the possibility of creation and 
annihilation of particle-antiparticle pairs by an energy influx higher than 2mc’, the number of particles 
participating in high-energy interactions is not fixed. An adequate general description of such situations 
is given by the quantum field theory, in which the particle’s wavefunction is treated as a field to be 


quantized, using so-called field operators P(r,t)— very much similar to the electromagnetic field 
operators (16). The Dirac equation follows from such theory in the single-particle approximation. 

As was mentioned above on several occasions, the quantum field theory is well beyond the 
time/space limits of this course, and I have to stop here, referring the interested reader to one of several 


excellent textbooks on this discipline.** However, I would strongly encourage the students going in this 


direction to start by playing with the field operators on their own, taking clues from Eqs. (16), but 
+ 


replacing the creation/annihilations operators a, and a, of the electromagnetic field oscillators with 


those of the general second quantization formalism outlined in Sec. 8.3. 


9.8. Exercise problems 


9.1. Prove the Casimir formula, given by Eq. (23), by calculating the net force F = 74 exerted by 
the electromagnetic field, in its ground state, on two perfectly conducting parallel plates of area A, 
separated by a vacuum gap of width t << 4'”. 


Hint: Calculate the field energy in the gap volume with and without the account of the plate 
effect, and then apply the Euler-Maclaurin formula® to the difference between these two results. 


9.2. Electromagnetic radiation by some single-mode quantum sources may have such a high 
degree of coherence that it is possible to observe the interference of waves from two independent 
sources with virtually the same frequency, incident on one detector. 


(1) Generalize Eq. (29) to this case. 


62 For a gradual introduction see, e.g., either L. Brown, Quantum Field Theory, Cambridge U. Press (1994) or R. 
Klauber, Student Friendly Quantum Field Theory, Sandtrove (2013). On the other hand, M. Srednicki, Quantum 
Field Theory, Cambridge U. Press (2007) and A. Zee, Quantum Field Theory in a Nutshell, 2™ ed., Princeton 
(2010), among others, offer steeper learning curves. 

63 See, e.g., MA Eq. (2.12a). 
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(ii) Use this generalized expression to show that incident waves in different Fock states do not 
create an interference pattern. 


9.3. Calculate the zero-delay value g(0) of the second-order correlation function of a single- 
mode electromagnetic field in the so-called Schrédinger-cat state:°* a coherent superposition of two 
Glauber states, with equal but sign-opposite parameters a, and a certain phase shift between them. 


9.4. Calculate the zero-delay value g°(0) of the second-order correlation function of a single- 
mode electromagnetic field in the squeezed ground state ¢ defined by Eq. (5.142). 


9.5. Calculate the rate of spontaneous photon emission (into unrestricted free space) by a 
hydrogen atom, initially in the 2p state (n = 2, /= 1) with m = 0. Would the result be different for m = + 
1? for the 2s state (n = 2, / = 0, m = 0)? Discuss the relation between these quantum-mechanical results 
and those given by the classical theory of radiation for the simplest classical model of the atom. 


9.6. An electron has been placed on the lowest excited level of a spherically-symmetric, 
quadratic potential well U(r) = m.@’1’/2. Calculate the rate of its relaxation to the ground state, with the 
emission of a photon (into unrestricted free space). Compare the rate with that for a similar transition of 
the hydrogen atom, for the case when the radiation frequencies of these two systems are equal. 


9.7. Derive an analog of Eq. (53) for the spontaneous photon emission into the free space, due to 
a change of the magnetic dipole moment m of a small-size system. 


9.8. A spin-’4 particle, with a gyromagnetic ratio ¥, is in its orbital ground state in dc magnetic 
field Bp. Calculate the rate of its spontaneous transition from the higher to the lower energy level, with 
the emission of a photon into the free space. Evaluate this rate for in an electron in a field of 10 T, and 
discuss the implications of this result for laboratory experiments with electron spins. 


9.9. Calculate the rate of spontaneous transitions between the two sublevels of the ground state 
of a hydrogen atom, formed as a result of its hyperfine splitting. Discuss the implications of the result 
for the width of the 21-cm spectral line of hydrogen. 


9.10. Find the eigenstates and eigenvalues of the Jaynes-Cummings Hamiltonian (78), and 
discuss their behavior near the resonance point a@= ©. 


9.11. Analyze the Purcell effect, mentioned in Secs. 3 and 4, quantitatively; in particular, 
calculate the so-called Purcell factor Fp defined as the ratio of the rate [; of atom’s spontaneous 
emission into a resonant cavity tuned exactly to the quantum transition frequency, to that into the free 
space. 


9.12. Prove that the Klein-Gordon equation (84) may be rewritten in the form similar to the non- 
relativistic Schrodinger equation (1.25), but for a two-component wavefunction, with the Hamiltonian 
represented (in the usual z-basis) by the following 2x2-matrix: 


64 Its name stems from the well-known Schrédinger cat paradox, which is (very briefly) discussed in Sec. 10.1. 
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_ \h 
H= (6, +16, a +mc’o,. 
2m 
Use your solution to discuss the physical meaning of the wavefunction’s components. 


9.13. Calculate and discuss the energy spectrum of a relativistic, spinless, charged particle placed 
into an external uniform, time-independent magnetic field &. Use the result to formulate the condition 
of validity of the non-relativistic theory in this situation. 


9.14. Prove Eq. (91) for the energy spectrum of a hydrogen-like atom/ion, starting from the 
relativistic Schrédinger equation. 


Hint: A mathematical analysis of Eq. (3.193) shows that its eigenvalues are given by Eq. (3.201), 
Ey = —1/2n’, with n=/+ 1 +n,, where n,=0, 1, 2,..., even if the parameter / is not integer. 


9.15. Derive a general expression for the differential cross-section of elastic scattering of a 
spinless relativistic particle by a static potential U(r), in the Born approximation, and formulate the 
conditions of its validity. Use these results to calculate the differential cross-section of scattering of a 
particle with the electric charge —e by the Coulomb electrostatic potential @(r) = Ze/4ze0r. 


9.16. Starting from Eqs. (95)-(98), prove that the probability density w given by Eq. (101) and 
the probability current density j defined by Eq. (102) do indeed satisfy the continuity equation (1.52): 
Ow/ét + V-j = 0. 


9.17. Calculate the commutator of the operator L? and Dirac’s Hamiltonian of a free particle. 
Compare the result with that for the non-relativistic Hamiltonian, and interpret the difference. 


9.18. Calculate commutators of the operators S? and J? with Dirac’s Hamiltonian (97), and give 
an interpretation of the results. 


9.19. In the Heisenberg picture of quantum dynamics, derive an equation describing the time 
evolution of free electron’s velocity in the Dirac theory. Solve the equation for the simplest state, with 
definite energy and momentum, and discuss the solution. 


9.20. Calculate the eigenstates and eigenenergies of a relativistic spin-”% particle with charge gq, 
placed into a uniform, time-independent external magnetic field &. Compare the calculated energy 
spectrum with those following from the non-relativistic theory and the relativistic Schrédinger equation. 


9.21." Following the discussion at the very end of Section 7, introduce quantum field operators 
y that would be related to the usual wavefunctions y just as the electromagnetic field operators (16) 


are related to the classical electromagnetic fields, and explore basic properties of these operators. (For 
this preliminary study, consider the fixed-time situation.) 
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Chapter 10. Making Sense of Quantum Mechanics 
This (rather brief) chapter addresses some conceptually important issues of quantum measurements and 


quantum state interpretation. Please note that some of these issues are still subjects of debate! — 
fortunately not affecting quantum mechanics’ practical results, discussed in the previous chapters. 


10.1. Quantum measurements 


The knowledge base outlined in the previous chapters gives us a sufficient background for a (by 
necessity, very brief) discussion of quantum measurements.? Let me start by reminding the reader of the 
only postulate of the quantum theory that relates it to experiment — so far, meaning perfect 
measurements. In the simplest case when the system is in a coherent (pure) quantum state, its ket-vector 
may be represented as a linear superposition 


ja) = d'a,|4,), (10.1) 


where a; are the eigenstates of the operator of an observable A, related to its eigenvalues A; by Eq. 
(4.68): 
Ala,)=A,|a;). (10.2) 


In such a state, the outcome of every single measurement of the observable A may be uncertain, but is 
restricted to the set of eigenvalues 4;, with the j'" outcome probability equal to 


2 
W,=|a,|. (10.3) 


As was discussed in Chapter 7, the state of the system (or rather of the statistical ensemble of 
macroscopically similar systems we are using for this particular series of similar experiments) may be 
not coherent, and hence even more uncertain than the state described by Eq. (1). Hence, the 
measurement postulate means that even if the system is in this (the least uncertain) state, the 
measurement outcomes are sfi// probabilistic.+ 


If we believe that a particular measurement may be done perfectly, and do not worry too much 
how exactly, we are subscribing to the mathematical notion of measurement, that was, rather reluctantly, 
used in these notes — up to this point. However, the actual (physical) measurements are always 
imperfect, first of all because of the huge gap between the energy-time scale h ~ 10°" J-s of the quantum 
phenomena in “microscopic” systems such as atoms, and the “macroscopic” scale of the direct human 
perception, so that the role of the instruments bridging this gap (Fig. 1), is highly nontrivial. 


' For an excellent review of these controversies, as presented in a few leading textbooks, I highly recommend J. 
Bell’s paper in the collection by A. Miller (ed.), Sixty-Two Years of Uncertainty, Plenum, 1989. 

2 “Quantum measurements” is a very unfortunate and misleading term; it would be more sensible to speak about 
“measurements of observables in quantum mechanical systems”. However, the former term is so common and 
compact that I will use it — albeit rather reluctantly. 

3 The measurement outcomes become definite only in the trivial case when the system is definitely in one of the 
eigenstates a;, say do; then a = d,oexp {ig}, and W; = do. 
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to a human 


interaction 
: observer 
instrument 7 ee > 
‘ : 
back action Fig.10.1. The general 
quantum macroscopic scheme of a quantum 
system pointer measurement. 


Besides the famous Bohr-Einstein discussion in the mid-1930s, which will be briefly reviewed 
in Sec. 3, the founding fathers of quantum mechanics have not paid much attention to these issues, 
apparently because of the following reason. At that time it looked like the experimental instruments (at 
least the best of them :-) were doing exactly what the measurement postulate was telling. For example, 
the z-oriented Stern-Gerlach experiment (Fig. 4.1) turns two complex coefficients at and a, describing 
the spin state of the incoming electrons, into a set of particle-counter clicks, with the rates proportional 
to, respectively, |at\* and ||’. The crude internal nature of these instruments makes more detailed 
questions unnatural. For example, each click of a Geiger counter involves an effective disappearance of 
one observed electron in a zillion-particle electric discharge avalanche it has triggered. A century ago, it 
looked much more important to extend the newly born quantum mechanics to more complex systems 
(such as atomic nuclei, etc.) than to think about the physics of such instruments. 


However, since that time the experimental techniques, notably including high-vacuum and low- 
temperature systems, micro- and nano-fabrication, and low-noise electronics, have improved quite 
dramatically. In particular, we now may observe quantum-mechanical behavior of more and more 
macroscopic objects — such as the micromechanical oscillators mentioned in Sec. 2.9. Moreover, some 
“macroscopic quantum systems” (in particular, special systems of Josephson junctions, see below) have 
properties enabling their use as essential parts of measurement setups. Such developments are making 
the line separating the “micro” and “macro” worlds finer and finer, so that more inquisitive inquiries 
into the physical nature of quantum measurements are not so hopeless now. In my personal scheme of 
things,* these inquiries may be grouped as follows: 


(i) Does a quantum measurement involve any laws besides those of quantum mechanics? In 
particular, should it necessarily involve a human/intelligent observer? (The last question is not as 
laughable as it may look — see below.) 


(ii) What is the state of the measured system just after a single-shot measurement — meaning a 
measurement process limited to a time interval much shorter than the time scale of the measured 
system’s evolution? (This question is a necessary part of any discussion of repeated measurements and 
of their ultimate form — continuous monitoring of a certain observable.) 


(iii) If a measurement of an observable A has produced a certain outcome A;, what statements 
may be made about the state of the system just before the measurement? (This question is most closely 
related to various interpretations of quantum mechanics.) 


Let me discuss these issues in the listed order. First of all, I am happy to report that there is a 
virtual consensus of physicists on some aspects of these issues. According to this consensus, any 
reasonable quantum measurement needs to result in a certain, distinguishable state of a macroscopic 
output component of the measurement instrument — see Fig. 1. (Traditionally, its component is called a 


4 Again, this list and some other issues discussed in the balance of this section are still controversial. 
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pointer, though its role may be played by a printer or a plotter, an electronic circuit sending out the 
result as a number, etc.). This requirement implies that the measurement process should have the 
following features: 


- provide a large “signal gain”, i.e. some means of mapping the quantum state with its f-scale of 
action (i.e. of the energy-by-time product) onto a macroscopic position of the pointer with a much larger 
action scale, and 


- if we want to approach the fundamental limit of uncertainty, given by Eq. (3), the instrument 
should introduce as little additional fluctuations (“noise”) as permitted by the laws of physics. 


Both these requirements are fulfilled in a well-designed Stern-Gerlach experiment — see Fig. 4.1 
again. Indeed, the magnetic field gradient, splitting the electron beam, turns the minuscule (microscopic) 
energy difference (4.167) between two spin-polarized states into a macroscopic difference between the 
final positions of two output beams, where their detectors may be located. However, as was noted 
above, the internal physics of the particle detectors (say, Geiger counters) at this measurement is rather 
complex, and would not allow us to discuss some aspects of the measurement, in particular to answer 
the second of inquiries we are working on. 


This is why let me describe the scheme of an almost similar “single-shot” measurement of a two- 
level quantum system, which shares the simplicity, high gain, and low internal noise of the Stern- 
Gerlach apparatus, but has an advantage that at its certain hardware implementations,> the measurement 
process allows a thorough, quantitative theoretical description. Let us measure a particle trapped in a 
double-well potential (Fig. 2), where x is some continuous generalized coordinate — not necessarily a 
mechanical displacement. Let the particle be initially in a pure quantum state, with the energy close to 
the well’s bottom. Then, as we know from the discussion of such systems in Secs. 2.6 and 5.1, the state 
may be described by a ket-vector similar to that of spin-/: 


|ja)=a_,|>)+a_|<), (10.4) 


where the component states — and <— is described by wavefunctions localized near the potential well 
bottoms at x ~ -Exo — see the blue lines in Fig. 2. Our goal is to measure in which well the particle resides 
at a certain time instant, say at ¢ = 0. For that, let us rapidly change, at that moment, the potential profile 
of the system, so that at ¢ > 0, near the origin, it may be well described by an inverted parabola: 


Whe 
Uae 2, for t>0, |x|<<x,. (10.5) 


5 The scheme may be implemented, for example, using a simple Josephson-junction circuit called the balanced 
comparator — see, e.g., T. Walls et al., IEEE Trans. on Appl. Supercond. 17, 136 (2007), and references therein. 
Experiments have demonstrated that this system may have a measurement variance dominated by the theoretically 
expected quantum-mechanical uncertainty, at practicable experimental conditions (at temperatures below ~ 1K). 
A conceptual advantage of this system is that it is based on externally-shunted Josephson junctions, i.e. the 
devices whose quantum-mechanical model, including its part describing the coupling to the environment, is in a 
quantitative agreement with experiment — see, e.g., D. Schwartz et al., Phys. Rev. Lett. 55, 1547 (1985). 
Colloquially, the balanced comparator is a high-gain instrument with a “well-documented Hamiltonian”, 
eliminating the need for speculations about the environmental effects. In particular, the dephasing process in it, 
and its time 7>, are well described by Eqs. (7.89) and (7.142), with the coefficients 7 equal to the Ohmic 
conductances G of the shunts. 
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It is straightforward to verify that the Heisenberg equations of motion in such an inverted 
potential describe exponential growth of operator x in time (proportional to exp{Ar}) and hence a 
similar, proportional growth of the expectation value (x) and its r.m.s. uncertainty dx.° At this “inflation” 
stage, the coherence between the two component states — and < is still preserved, i.e. the time 
evolution of the system is, in principle, reversible. 


(a) (b) 
U(x,t) 


Fig. 10.2. The potential inversion, as viewed on the (a) “macroscopic” 
and (b) “microscopic” scales of the generalized coordinate x. 


Now let the system be weakly coupled, also at t > 0, to a dissipative (e.g., Ohmic) environment. 
As we know from Chapter 7, such coupling ensures the state’s dephasing on some time scale 7». If 


Xp << Xp ExP{ATy}, Xp, (10.6) 


then the process, after the potential inversion, consists of two stages, well separated in time: 
- the already discussed “inflation” stage, preserving the component the state’s coherence, and 


- the dephasing stage, at which the coherence of the component states + and < is gradually 
suppressed as described by Eq. (7.89), i.e. the density matrix of the system is gradually reduced to the 
diagonal form describing a classical mixture of two probability packets with the probabilities (3) equal 
to, respectively, W_, =|a.? and W_=|a_? =1-|a_|’. 


Besides dephasing, the environment gives the motion certain kinematic friction, with the drag 
coefficient 7 (7.141), so that the system eventually settles to rest at one of the macroscopically separated 
minima x = +xr of the inverted potential (Fig. 2a), thus ensuring a high “signal gain” x9/x9 >> 1. As a 
result, the final probability density distribution w(x) along the x-axis has two narrow, well-separated 
peaks. But this is just the situation that was discussed in Sec. 2.5 — see, in particular, Fig. 2.17. Since 
that discussion is very important, let me repeat — or rather rephrase it. The final state of the system is a 
classical mixture of two well-separated states, with the respective probabilities W— and W_,, whose sum 
equals 1. Now let us use some detector to test whether the system is in one of these states — say the right 


6 Somewhat counter-intuitively, the latter growth improves the measurement’s fidelity. Indeed, it does not affect 
the intrinsic “signal-to-noise ratio” dx/(x), while making the intrinsic (say, quantum-mechanical) uncertainty much 
larger than the possible noise contribution by the later measurement stage(s). 
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one. (If x¢ is sufficiently large, the noise contribution of this detector into the measurement uncertainty is 
negligible,’ and its physics is unimportant.) If the system has been found at this location (again, the 
probability of this outcome is W_, = |a_,|’), the probability to find it at the counterpart (left) location at a 
consequent detection turns to zero. 


This probability “reduction” is a purely classical (or if you like, mathematical) effect of the 
statistical ensemble’s re-definition: W— equals zero not in the initial ensemble of all similar experiments 
(where is equals |@|’), but only in the re-defined ensemble of experiments in that the system had been 
found at the right location. Of course, which ensemble to use, i.e. what probabilities to register/publish 
is a purely accounting decision, which should be made by a human (or otherwise intelligent :-) observer. 
If we are only interested in an objective recording of results of a pre-fixed sequence of experiments (i.e. 
the members of a pre-defined, fixed statistical ensemble), there is no need to include such an observer in 
any discussion. In any case, this detection/registration process, very common in classical statistics, 
leaves no space for any mysterious “wave packet reduction” — understood as a hypothetical process that 
would not obey the regular laws of quantum mechanical evolution. 


The state dephasing and ensemble re-definition at measurements are in the core of several 
paradoxes, of which the so-called guantum Zeno paradox is perhaps the most spectacular.® Let us return 
to a two-level system with the unperturbed Hamiltonian given by Eq. (4.166), the quantum oscillation 
period 27/Q much longer than the single-shot measurement time, and the system initially (at ¢ = 0) 
definitely in one of the partial quantum states — for example, a certain potential well of the double-well 
potential. Then, as we know from Secs. 2.6 and 4.6, the probability to find the system in this initial state 
at time ¢> 0 is 


Q eee 
W(t) = cos?  =1-sin? =. (10.7) 
2 2 
If the time is small enough (¢ = dt << 1/Q), we may use the Taylor expansion to write 
Q? 2 
W(dt) ~1- < (10.8) 


Now, let us use some good measurement scheme (say, the potential inversion discussed above) 
to measure whether the system is still in this initial state. If it is (as Eq. (8) shows, the probability of 
such an outcome is nearly 100%), then the system, after the measurement, is in the same state. Let us 
allow it to evolve again, with the same Hamiltonian. Then the evolution of W will follow the same law 


7 At the balanced-comparator implementation mentioned above, the final state detection may be readily performed 
using a “SQUID” magnetometer based on the same Josephson junction technology — see, e.g., EM Sec. 6.5. In 
this case, the distance between the potential minima +x; is close to one superconducting flux quantum (3.38), 
while the additional uncertainty induced by the SQUID may be as low as a few millionths of that amount. 

8 This name, coined by E. Sudarshan and B. Mishra in 1997 (though the paradox had been discussed in detail by 
A. Turing in 1954) is due to its superficial similarity to the classical paradoxes by the ancient Greek philosopher 
Zeno of Elea. By the way, just for fun, let us have a look at what happens when Mother Nature is discussed by 
people that do not understand math and physics. The most famous of the classical Zeno paradoxes is the case of 
Achilles and Tortoise: the fast ranner Achilles can apparently never overtake the slower Tortoise, because (in 
Aristotle’s words) “the pursuer must first reach the point whence the pursued started, so that the slower must 
always hold a lead”. For a physicist, the paradox has a trivial, obvious resolution, but here is what a philosopher 
writes about it — not in some year BC, but in the 2010 AD: "Given the history of 'final resolutions’, from Aristotle 
onwards, it's probably foolhardy to think we've reached the end.” For me, this is a sad symbol of modern 
philosophy. 
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as in Eq. (7). Thus, when the system is measured again at time 2dt, the probability to find it in the same 


state both times is 
Py) 27,2 \% 
W 2a) = Wd ae }-{ ei ) (10.9) 


4 


After repeating this cycle N times (with the total time t = Nat still much less than N'?/Q), the probability 
that the system is still in its initial state is 


Oar)" Q?72)" OQ??? 
waar = 170) =[1- ; -[!- 1 (10.10) 


4N? AN” 
Comparing this result with Eq. (7), we see that the process of system’s transfer to the opposite partial 
state has been slowed down rather dramatically, and in the limit N — o (at fixed £), its evolution is 


virtually stopped by the measurement process. There is of course nothing mysterious here; the evolution 
slowdown is due to the quantum state dephasing at each measurement. 


This may be the only acceptable occasion for me to mention, very briefly, one more famous — or 
rather infamous Schrédinger cat paradox, so much overplayed in popular publications.? For this thought 
experiment, there is no need to discuss the (rather complicated :-) physics of the cat. As soon as the 
charged particle, produced at the radioactive decay, reaches the Geiger counter, the initial coherent 
superposition of the two possible quantum states (“the decay has happened’/“‘the decay has not 
happened”) of the system is rapidly dephased, i.e. reduced to their classical mixture, leading, 
correspondingly, to the classical mixture of the final macroscopic states “cat dead’’/“‘cat alive”. So, 
despite attempts by numerous authors, without a proper physics background, to represent this situation 
as a mystery whose discussion needs involvement of professional philosophers, hopefully the reader 
knows enough about dephasing from Chapter 7, to ignore all this babble. 


10.2. OND measurements 


I hope that the above discussion has sufficiently illuminated the issues of the group (i), so let me 
proceed to the question group (ii), in particular to the general issue of the back action of the instrument 
upon the system under measurement — symbolized with the back arrow in Fig. 1. In the instruments like 
the Geiger counter, such back action is large: the instrument essentially destroys (“demolishes”) the 
state of the system under measurement. Even the “cleaner” potential-inversion measurement, shown in 
Fig. 2, fully destroys the initial coherence of the system, i.e. perturbs it rather substantially. 


However, in the 1970s it was understood that this is not really necessary. For example, in Sec. 

7.3, we have already discussed an example of a two-level system coupled with its environment and 
described by the Hamiltonian (7.68)-(7.70): 

H=H,+H,,+H jal, with WH, =c.6 


G,, and H,, =-f{aje (10.11) 


z? 


so that 
=o. (10.12) 


9] fully agree with S. Hawking who has been quoted to say, “When I hear about the Schrédinger cat, I reach for 
my gun.” The only good aspect of this popularity is that the formulation of this paradox should be so well 
known to the reader that I do not need to waste time/space repeating it. 
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Comparing this equality with Eq. (4.199), applied to the explicitly-time-independent Hamiltonian H a 


int, =|A,.A]=|A,.(4,+4,, +4,a)l=|4,.4,, |=0. (10.13) 


we see that in the Heisenberg picture, the Hamiltonian operator (and hence the energy) of the system of 
our interest does not change in time. On the other hand, if the “environment” in this discussion is the 
instrument used for the measurement (see Fig. 1 again), the interaction can change its state, so it may be 
used to measure the system’s energy — or another observable whose operator commutes with the 
interaction Hamiltonian. Such a trick is called the guantum non-demolition (QND), or sometimes “back- 
action-evading” measurements.!? Due to the lack of back action of the instrument on the corresponding 
variable, such measurements allow its continuous monitoring. Let me present a fine example of an 
actual measurement of this kind — see Fig. 3.!! 


(a) 2 4.2K (b) 
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Fig. 10.3. QND measurements of single electron’s energy by Peil and Gabrielse: (a) the 
experimental setup’s core, and (b) a record of the thermal excitation and spontaneous relaxation 
of the Fock states. © 1999 APS; reproduced with permission. 


In this experiment, a single electron is captured in a Penning trap — a combination of a (virtually) 
uniform magnetic field @ and a quadrupole electric field.'* This electric field stabilizes the cyclotron 
orbits but does not have any noticeable effect on electron motion in the plane perpendicular to the 
magnetic field, and hence on its Landau level energies — see Eq. (3.50): 


eB 


m 


e 


E, hol n+ 3} with @, = (10.14) 


(in the cited work, with 4 5.3 T, the cyclic frequency @,/27 was about 147 GHz, so that the Landau 


level splitting fa@, was close to 10° J, ie. corresponded to kgT at T ~10 K, while the physical 
temperature of the system might be reduced well below that, down to 80 mK). Now note that the 


10 For a detailed discussion of this field see, e.g., V. Braginsky and F. Khalili (ed. by K. Thorne), Quantum 
Measurement, Cambridge U. Press, 1992; for an earlier review, see V. Braginsky et al., Science 209, 547 (1980). 
11'S, Peil and G. Gabrielse, Phys. Rev. Lett. 83, 1287 (1999). 

!2 Tt is similar to the 2D system discussed in EM Sec. 2.7, but with additional rotation about one of the axes. 
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analogy between a Landau-level particle and a harmonic oscillator goes beyond the energy spectrum 
(14). Indeed, since the Hamiltonian of a 2D particle in a perpendicular magnetic field may be reduced to 
Eq. (3.47), similar to that of a 1D oscillator, we may repeat all procedures of Sec. 5.4 and rewrite this 
effective Hamiltonian in the terms of the creation-annihilation operators — see Eq. (5.72): 


H, han ata+3). (10.15) 


In the Peil and Gabrielse experiment, the trapped electron had one more degree of freedom — 
along the magnetic field. The electric field of the Penning trap created a soft confining potential along 
this direction (vertical in Fig. 3a; I will take it for the z-axis), so that small electron oscillations along 
that axis could be well described as those of a 1D harmonic oscillator of much lower eigenfrequency, in 
that particular experiment with @,/27 ~ 64 MHz. This frequency could be measured very accurately 
(with error ~1 Hz) by sensitive electronics whose electric field does affect the z-motion of the electron, 
but not its motion in the perpendicular plane. In an exactly uniform magnetic field, the two modes of 
electron motion would be completely uncoupled. However, the experimental setup included two special 
superconducting rings made of niobium (see Fig. 3a), which slightly distorted the magnetic field and 
created an interaction between the modes, which might be well approximated by the Hamiltonian!3 


He. = constx{ ata+ 22", (10.16) 


so that the main condition (12) of a QND measurement was very closely satisfied. At the same time, the 
coupling (16) ensured that a change of the Landau level number n by 1 changed the z-oscillation 
eigenfrequency by ~12.4 Hz. Since this shift was substantially larger than electronics’ noise, rare 
spontaneous changes of n (due to a weak uncontrolled coupling of the electron to the environment) 
could be readily measured — moreover, continuously monitored — see Fig. 3b. The record shows 
spontaneous excitations of the electron to higher Landau levels, with its sequential relaxation, just as 
described by Eqs. (7.208)-(7.210). The detailed data statistics analysis showed that there was virtually 
no effect of the measuring instrument on these processes — at least on the scale of minutes, i.e. as many 
as ~10'° cyclotron orbit periods. !4 


It is important, however, to note that any measurement — QND or not — cannot avoid the 
uncertainty relations between incompatible variables; in the particular case described above, continuous 
monitoring of the Landau state number n does not allow the simultaneous monitoring of its quantum 
phase (which may be defined exactly as in the harmonic oscillator). In this context, it is natural to 
wonder whether the QND measurement concept may be extended from quadratic-form variables like 
energy to “usual” observables such as coordinates and momenta. whose uncertainties are bound by the 
ordinary Heisenberg’s relation (1.35). The answer is yes, but the required methods are a bit more tricky. 


For example, let us place an electrically charged particle into a uniform electric field €= n,€(f) 
of an instrument, so that their interaction Hamiltonian is 


13 Here I have simplified the real situation a bit. Actually, in that experiment, there was an electron spin’s 
contribution to the interaction Hamiltonian as well, but since the used high magnetic field polarized the spins 
quite reliably, their only effect was a constant shift of the frequency @,, which is not important for our discussion. 
14 See also the conceptually similar experiments, performed by different means: G. Nogues et al., Nature 400, 239 
(1999). 
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Ay, =—Gé (OS. (10.17) 


Such interaction may certainly pass the information on the time evolution of the coordinate x to the 
instrument. However, in this case, Eq. (12) is not satisfied — at least for the kinetic-energy part of the 
particle’s Hamiltonian; as a result, the interaction distorts its time evolution. Indeed, writing the 
Heisenberg equation (4.199) for the x-component of the momentum, we get 


B- P| pp= 960. (10.18) 


On the other hand, integrating Eq. (5.139) for the coordinate operator evolution, !° we get the expression 
; ‘ 1h, 
$(1) = R(t.) +—| pwr, (10.19) 
ms 


which shows that the perturbations (18) of the momentum eventually find their way to the coordinate 
evolution, not allowing its unperturbed sequential measurements. 


However, for such an important particular system as a harmonic oscillator, the following trick is 
possible. For this system, Eqs. (5.139) with the addition (18) may be readily combined to give a second- 
order differential equation for the coordinate operator, that is absolutely similar to the classical equation 
of motion of the system, and has a similar solution:!6 


£(t) = 3(0)| 59 +—— J é (t’)sina,(t-1')dt'. (10.20) 
: MQ) at 


This formula confirms that generally, the external field ¢(f) (in our case, the sensing field of the 
measurement instrument) affects the time evolution law — of course. However, Eq. (20) shows that if the 
field is applied only at moments ¢’,, separated by intervals 7/2, where 7= 27/q is the oscillation period, 
its effect on coordinate vanishes at similarly spaced observation instants ¢, = t, + (m +1/2)7. This is the 
idea of stroboscopic QND measurements. Of course, according to Eq. (18), even such measurement 
strongly perturbs the oscillator momentum, so that even if the values x, are measured with high 
accuracy, the Heisenberg’s uncertainty relation is not violated. 


A direct implementation of the stroboscopic measurements is technically complicated, but this 
initial idea has opened a way to more practicable solutions. For example, it is straightforward to use the 
Heisenberg equations of motion to show that if the coupling of two harmonic oscillators, with 
coordinates x and_X, and unperturbed frequencies wand Q, is modulated in time as 


Nn 


A. 


int 


oc EX cos wt cosMt , (10.21) 


'S This simple relation is limited to 1D systems with Hamiltonians of the type (1.41), but by now the reader 
certainly knows enough to understand that this discussion may be readily generalized to many other systems. 

'6 Note in particular that the function sin@z (with 7 = ¢ — t’) under the integral, divided by @, is nothing more 
than the temporal Green’s function G(7) of a loss-free harmonic oscillator — see, e.g., CM Sec. 5.1. 
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then the process in one of the oscillators (say, that with frequency Q) does not affect dynamics of one of 
the quadrature components of the counterpart oscillator, defined by relations!” 
x, = fcosat -— sin at, a = fsin at +—-cosar, (10.22) 
mo mo 
while this component’s motion does affect the dynamics of one of the quadrature components of the 
counterpart oscillator. (For the counterpart couple of quadrature components, the information transfer 
goes in the opposite direction.) This scheme has been successfully used for QND measurements.!® 


Please note that the last two QND measurement examples are based on the idea of a periodic 
change of a certain parameter in time — either in the short-pulse form or the sinusoidal form. If the only 
goal of a QND measurement is a sensitive measurement of a weak classical force acting on a quantum 
probe system, i.e. a 1D oscillator of eigenfrequency @, it may be implemented much simpler — just by 
modulating an oscillator’s parameter with a frequency @ ~ 2@. From the classical dynamics, we know 
that if the depth of such modulation exceeds a certain threshold value, it results in the excitation of the 
so-called degenerate parametric oscillations with frequency w/2 ~ @p, and one of two opposite phases. !? 
In the language of Eq. (22), the parametric excitation means exponential growth of one of the quadrature 
components (with its sign depending on initial conditions), while the counterpart component is 
suppressed. Close to, but below the excitation threshold, the parameter modulation boosts all 
fluctuations of the almost-excited component, including its quantum-mechanical uncertainty, and 
suppresses (squeezes) those of the counterpart component. The result is a squeezed state, already 
discussed in Sec. 5.5 of this course (see in particular Eqs. (5.143) and Fig. 5.8), which allows one to 
notice the effect of an external force on the oscillator on the backdrop of a quantum uncertainty much 
smaller than the standard quantum limit (5.99). 


In electrical engineering, this fact may be conveniently formulated in terms of noise parameter 
On of a linear amplifier — essentially the tool for continuous monitoring of an input “signal” — e.g., a 
microwave or optical waveform.”° Namely, On of “usual” (say, transistor or maser) amplifiers which are 
equally sensitive to both quadrature components of the signal, @y has the minimum value fi@/2, due to 
the quantum uncertainty pertinent to the quantum state of the amplifier itself (which therefore plays the 
role of its “quantum noise’) — the fact that was recognized in the early 1960s.*! On the other hand, a 


'7 The physical sense of these relations should be clear from Fig. 5.8: they define a system of coordinates rotating 
clockwise with the angular velocity equal to @, so that the point representing unperturbed classical oscillations 
with that frequency is at rest in this rotating frame. (The “probability cloud” representing a Glauber state is also 
stationary in the coordinates [x), x2].) The reader familiar with the classical theory oscillations may notice that the 
observables x, and x so defined are just the Poincaré plane coordinates (“RWA variables’) — see, e.g., CM Sec. 
5.3-5.6, and especially Fig. 5.9, where these coordinates are denoted as u and v. 

'8 The first, initially imperfect QND experiments were reported by R. Slusher et al., Phys. Rev. Lett. 55, 2409 
(1985), and other groups soon after this, using nonlinear interactions of optical waves. Later, the results were 
much improved — see, e.g., P. Grangier et al., Nature 396, 537 (1998), and references therein. Recently, such 
experiments were extended to mechanical systems — see, e.g., F. Lecocq et al., Phys. Rev. X 5, 041037 (2015). 

'9 See, e.g., CM Sec. 5.5, and also Fig. 5.8 and its discussion in Sec. 5.6. 

20 For a quantitative definition of the latter parameter, suitable for the quantum sensitivity range (Oy ~ ha) as 
well, see, e.g., I. Devyatov et al., J. Appl. Phys. 60, 1808 (1986). In the classical noise limit (Oy >> ho), it 
coincides with kg7y, where Ty is a more popular measure of electronics’ noise, called the noise temperature. 

21 See, e.g., H. Haus and J. Mullen, Phys. Rev. 128, 2407 (1962). 
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degenerate parametric amplifier, sensitive to just one quadrature component, may have ©y well below 
ha/2, due to its ground state squeezing 2? 


Let me note that the parameter-modulation schemes of the QND measurements are not limited to 
harmonic oscillators, and may be applied to other important quantum systems, notably including two- 
level (i.e. spin-'4-like) systems.23 Such measurements may be an important tool for the further progress 
of quantum computation and cryptography.”4 


Finally, let me mention that the composite systems consisting of a quantum subsystem, and a 
classical subsystem performing its continuous weakly-perturbing measurement and using its results for 
providing a specially crafted feedback to the quantum subsystem, may have some curious properties, in 
particular mock a quantum system detached from the environment.?5 


10.3. Hidden variables and local reality 


Now we are ready to proceed to the discussion of the last, hardest group (ili) of the questions 
posed in Sec. 1, namely on the state of a quantum system just before its measurement. After a very 
important but inconclusive discussion of this issue by Albert Einstein and his collaborators on one side, 
and Niels Bohr on the other side, in the mid-1930s, such discussions have resumed in the 1950s.7° They 
have led to a key contribution by John Stewart Bell in the early 1960s, summarized as so-called Bell’s 
inequalities, and then to experimental work on better and better verification of these inequalities. 
(Besides that work, the recent progress, in my humble view, has been rather marginal.) 


The central question may be formulated as follows: what had been the “real” state of a quantum- 
mechanical system just before a virtually perfect single-shot measurement was performed on it, and 
gave a certain, documented outcome? To be specific, let us focus again on the example of Stern-Gerlach 
measurements of spin-’4 particles — because of their conceptual simplicity.2” For a single-component 
system (in this case a single spin-'4) the answer to the posed question may look evident. Indeed, as we 
know, if the spin is in a pure (least-uncertain) state @, i.e. its ket-vector may be expressed in the form 
similar to Eq. (4), 

ja)=a,|T)+a,|v), (10.23) 


where, as usual, T and \ denote the states with definite spin orientations along the z-axis, the 
probabilities of the corresponding outcomes of the z-oriented Stern-Gerlach experiment are Wt = |an|” 
and W, = |a.|’. Then it looks natural to suggest that if a particular experiment gave the outcome 
corresponding to the state T, the spin had been in that state just before the experiment. For a classical 


22 See, e.g., the spectacular experiments by B. Yurke et al., Phys. Rev. Lett. 60, 764 (1988). Note also that the 
squeezed ground states of light are now used to improve the sensitivity of interferometers in gravitational wave 
detectors — see, e.g., the recent review by R. Schnabel, Phys. Repts. 684, 1 (2017), and the later paper by F. 
Acernese et al., Phys. Rev. Lett. 123, 231108 (2019). 

23 See, e.g., D. Averin, Phys. Rev. Lett. 88, 207901 (2002). 

24 See, e.g., G. Jaeger, Quantum Information: An Overview, Springer, 2006. 

25 See, e.g., the monograph by H. Wiseman and G. Milburn, Quantum Measurement and Control, Cambridge U. 
Press (2009), more recent experiments by R. Vijay et al., Nature 490, 77 (2012), and references therein. 

26 See, e.g., J. Wheeler and W. Zurek (eds.), Quantum Theory and Measurement, Princeton U. Press, 1983. 

27 As was discussed in Sec. 1, the Stern-Gerlach-type experiments may be readily made virtually perfect, provided 
that we do not care about the evolution of the system after the single-shot measurement. 
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system such answer would be certainly correct, and the fact that the probability Wt = |at|’, defined for 
the statistical ensemble of a// experiments (regardless of their outcome), may be less than 1, would 
merely reflect our ignorance about the real state of this particular system before the measurement — 
which just reveals the real situation. 


However, as was first argued in the famous EPR paper published in 1935 by A. Einstein, B. 
Podolsky, and N. Rosen, such an answer becomes impossible in the case of an entangled quantum 
system, if only one of its components is measured with an instrument. The original EPR paper discussed 
thought experiments with a pair of 1D particles prepared in a quantum state in that both the swm of their 
momenta and the difference of their coordinates simultaneously have definite values: p; + po = 0, x1 — x2 
= a.*8 However, usually this discussion is recast into an equivalent Stern-Gerlach experiment shown in 
Fig. 4a.29 A source emits rare pairs of spin-/2 particles, propagating in opposite directions. The particle 
spin states are random, but with the net spin of the pair definitely equal to zero. After the spatial 
separation of the particles has become sufficiently large (see below), the spin state of each of them is 
measured with a Stern-Gerlach detector, with one of them (in Fig. 1, SG;) somewhat closer to the 
particle source, so it makes the measurement first, at a time t) < ho. 


(a) (b) 


particle pair 


Fig. 10. 4. (a) General scheme 
of two-particle Stern-Gerlach 
experiments, and (b)_ the 
I—-Q orientation of the detectors, 
assumed at Wigner’s deviation 
of Bell’s inequality (36). 


Stern-Gerlach detectors 
on both sides 


First, let the detectors be oriented say along the same direction, say the z-axis. Evidently, the 
probability of each detector to give any of the values s, = +h/2 is 50%. However, if the first detector had 
given the result S, = —//2, then even before the second detector’s measurement, we know that the latter 
will give the result S, = +f/2 with the 100% probability. So far, this situation still allows for a classical 
interpretation, just as for the single-particle measurements: we may fancy that the second particle has a 
definite spin before the measurement, and the first measurement just removes our ignorance about that 
reality. In other words, the change of the probability of the outcome S, = +h/2 at the second detection 
from 50% to 100% is due to the statistical ensemble re-definition: the 50% probability of this detection 
belongs to the ensemble of all experiments, while the 100% probability, to the sub-ensemble of 
experiments with the S, =—//2 outcome of the first experiment. 


However, let the source generate the spin pairs in the entangled, singlet state (8.18), 


|S12) = (%)-)) (10.24) 


28 This is possible because the corresponding operators commute: [/, + p,,%, -%,]=[4,,%,]-[A,,%,]=0. 

29 Another equivalent but experimentally more convenient (and as a result, frequently used) technique is the 
degenerate parametric excitation of entangled optical photon pairs — see, e.g., the publications cited at the end of 
this section. 
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that certainly satisfies the above assumptions: the probability of each value of S, of any particle is 50%, 
and the sum of both S_ is definitely zero, so that if the first detector’s result is S = —h/2, then the state of 


the remaining particle is T, with zero uncertainty. Now let us use Eqs. (4.123) to represent the same state 
(24) in a different form: 


1 


bn) eb e)-lO- FO -lg)14)}- cas 


Opening the parentheses (carefully, without swapping the ket-vector order, which encodes the particle 
numbers!), we get an expression similar to Eq. (24), but now for the x-basis: 


su)= sel »<)-|e >), (10.26) 


Hence if we use the first detector (closest to the particle source) to measure S, rather than S-, then after it 
had given a certain result (say, S, = —f/2), we know for sure, before the second particle spin’s 
measurement, that its S, component definitely equals +h/2. 


So, depending on the experiment performed on the first particle, the second particle, before its 
measurement, may be in one of two states — either with a definite component S, or with a definite 
component S,, in each case with zero uncertainty. Evidently, this situation cannot be interpreted in 
classical terms if the particles do not interact during the measurements. A. Einstein was deeply unhappy 
with such situation because it did not satisfy what, in his view, was the general requirement to any 
theory, which nowadays is called the Jocal reality. His definition of this requirement was as follows: 
“The real factual situation of system 2 is independent of what is done with system 1 that is spatially 
separated from the former”. (Here the term “spatially separated” is not defined, but from the context, it 
is clear that Einstein meant the detector separation by a superluminal interval, i.e. by distance 


|r, —1,|> elt, -t, 


: (10.27) 


where the measurement time difference on the right-hand side includes the measurement duration.) In 
Einstein’s view, since quantum mechanics did not satisfy the local reality condition, it could not be 
considered a complete theory of Nature. 


This situation naturally raises the question of whether something (usually called hidden 
variables) may be added to the quantum-mechanical description to enable it to satisfy the local reality 
requirement. The first definite statement in this regard was John von Neumann’s “proof”? (first famous, 
then infamous :-) that such variables cannot be introduced; for a while, his work satisfied the quantum 
mechanics practitioners, who apparently did not pay much attention.?! A major new contribution to the 
problem was made only in the 1960s by J. Bell.?? First of all, he has found an elementary (in his words, 
“foolish”) error in von Neumann’s logic, which voids his “proof”. Second, he has demonstrated that 
Einstein’s local reality condition is incompatible with conclusions of quantum mechanics — that had 
been, by that time, confirmed by too many experiments to be seriously questioned. 


30 In his very early book J. von Neumann, Mathematische Grundlagen der Quantenmechanik [Mathematical 
Foundations of Quantum Mechanics], Springer, 1932. (The first English translation was published only in 1955.) 
3! Perhaps it would not satisfy A. Einstein, but reportedly he did not know about the von Neumann’s publication 
before signing the EPR paper. 

32 See, e. g., either J. Bell, Rev. Mod. Phys. 38, 447 (1966) or J. Bell, Foundations of Physics 12, 158 (1982). 
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Let me describe a particular version of the Bell’s result (suggested by E. Wigner), using the same 
EPR pair experiment (Fig. 4a), in that each SG detector may be oriented in any of 3 directions: a, b, or c 
— see Fig. 4b. As we already know from Chapter 4, if a fully-polarized beam of spin-’2 particles is 
passed through a Stern-Gerlach apparatus forming angle ¢ with the polarization axis, the probabilities of 
two alternative outcomes of the experiment are 


W(,) = cos? 7 W(_) =sin* o (10.28) 
Let us use this formula to calculate all joint probabilities of measurement outcomes, starting from the 
detectors | and 2 oriented, respectively, in the directions a and c. Since the angle between the negative 
direction of the a-axis and the positive direction of the c-axis is @ gc: = 2— @ (see the dashed arrow in 
Fig. 4b), we get 

Wa, Ac.) =W(a W(c,|a,)=W(a.)W(b, 0.) = 00s" aoe asin” - (10.29) 
where W(x A y) is the joint probability of both outcomes x and y, while W(x | y) is the conditional 
probability of the outcome x, provided that the outcome y has happened. (The first equality in Eq. (29) is 
the well-known identity of the probability theory.) Absolutely similarly, 


Wc, b,)=W(c,.W(b\c,) = Ssin® - 


22-29 _1 
7 2 


(10.30) 


sin’ 9. (10.31) 


Wa, Ab,)=W(a,W(b,\a,) = : cos 


Now note that for any angle g smaller than 7/2 (as in the case shown in Fig. 4b), trigonometry gives 


I : : : 
3sin’ p2 sin” Py : sin? p = sin’ - 


2 2 2 
(For example, for g — 0 the left-hand side of this inequality tends to g’/2, while the right-hand side, to 
gy’ /4.) Hence the quantum-mechanical result gives, in particular, 


Wa, rb,)2W(a, Ac,)+W(c, rb,), for|g|< 7/2. 


On the other hand, we can get a different inequality for these probabilities without calculating 
them from any particular theory, but using the local reality assumption. For that, let us prescribe some 
probability to each of 2° = 8 possible outcomes of a set of three spin measurements. (Due to zero net 
spin of particle pairs, the probabilities of the sets shown in both columns of the table have to be equal.) 


(10.32) 


(10.33) 


Detector 1 Detector 2. | Probability 

aAb. Acs a.Nb.nc. W, 

a.AN bz AC. aN bn c+ W, 

a. Ab. cy a.N by Ac. W; 
Wa,» ®) anb Ac. a.N bs A cx Ws 

a. bi A cy a.Ab.nc. Ws 

aANbLAC. a. ADA cy Ws 

a.Nb.A cy a.A bc. W, 

a.Ab.nc. a,ANb,A Cs We 
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From the local-reality point of view, these measurement options are independent, so we may 
write (see the arrows on the left of the table): 


Wa, Ac,)=W,+W,, Wc, Ab,)=W,+W,, Wa, Ab,)=W,+W,. (10.34) 
On the other hand, since no probability may be negative (by its very definition), we may always write 
W,+W, <(W,4+W,)+(W,+W,). (10.35) 
Plugging into this inequality the values of these two parentheses, given by Eq. (34), we get 


Wa, rb,)<W(a, Ac,)+W(e, rb,). (10.36) 


This is the Bell’s inequality, which has to be satisfied by any local-reality theory; it directly 
contradicts the quantum-mechanical result (33) — opening the issue to direct experimental testing. Such 
tests were started in the late 1960s, but the first results were vulnerable to two criticisms: 


(i) The detectors were not fast enough and not far enough to have the relation (27) satisfied. This 
is why, as a matter of principle, there was a chance that information on the first measurement outcome 
had been transferred (by some, mostly implausible) means to particles before the second measurement — 
the so-called /ocality loophole. 


(11) The particle/photon detection efficiencies were too low to have sufficiently small error bars 
for both parts of the inequality — the detection loophole. 


Gradually, these loopholes have been closed.33 As expected, substantial violations of the Bell 
inequalities (36) (or their equivalent forms) have been proved, essentially rejecting any possibility to 
reconcile quantum mechanics with Einstein’s local reality requirement. 


10.4. Interpretations of quantum mechanics 


The fact that quantum mechanics is incompatible with local reality, makes it reconciliation with 
our (classically-bred) “common sense” rather challenging. Here is a brief list of the major interpretations 
of quantum mechanics, that try to provide at least a partial reconciliation of this kind. 


(i) The so-called Copenhagen interpretation — to which most physicists adhere. This 
“interpretation” does not really interpret anything; it just accepts the intrinsic stochasticity of 
measurement results in quantum mechanics, and the absence of local reality, essentially saying: “Do not 
worry; this is just how it is; live with it”. I generally subscribe to this school of thought, with the 
following qualification. While the Copenhagen interpretation implies statistical ensembles (otherwise, 
how would you define the probability? — see Sec. 1.3), its most frequently stated formulations*4 do not 
put a sufficient emphasis on their role, in particular on the ensemble re-definition as the only point of 
human observer’s involvement in a nearly-perfect measurement process — see Sec. 1 above. The most 


33 Important milestones in that way were the experiments by A. Aspect et al., Phys. Rev. Lett. 49, 91 (1982) and 
M. Rowe et al., Nature 409, 791 (2001). Detailed reviews of the experimental situation were given, for example, 
by M. Genovese, Phys. Repts. 413, 319 (2005) and A. Aspect, Physics 8, 123 (2015); see also the later paper by J. 
Handsteiner et al., Phys. Rev. Lett. 118, 060401 (2017). Presently, a high-fidelity demonstration of the Bell 
inequality violation has become a standard test in virtually every experiment with entangled qubits used for 
quantum encryption research — see Sec. 8.5, in particular the paper by J. Lin cited there. 

34 With certain pleasant exceptions — see, e.g. L. Ballentine, Rev. Mod. Phys. 42, 358 (1970). 
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famous objection to the Copenhagen interpretation belongs to A. Einstein: “God does not play dice.” 
OK, when Einstein speaks, we all should listen, but perhaps when God speaks (through experimental 
results), we have to pay even more attention. 


(11) Non-local reality. After the dismissal of J. von Neumann’s “proof” by J. Bell, to the best of 
my knowledge, there has been no proof that hidden parameters could not be introduced, provided that 
they do not imply the local reality. Of constructive approaches, perhaps the most notable contribution 
was made by David Joseph Bohm,?5 who developed the initial Louis de Broglie’s interpretation of the 
wavefunction as a “pilot wave”, making it quantitative. In the wave-mechanics version of this concept, 
the wavefunction governed by the Schrédinger equation, just guides a “real”, point-like classical particle 
whose coordinates serve as hidden variables. However, this concept does not satisfy the notion of local 
reality. For example, the measurement of the particle’s coordinate at a certain point r; has to instantly 
change the wavefunction everywhere in space, including the points rz in the superluminal range (27). 
After A. Einstein’s private criticism, D. Bohm essentially abandoned his theory.° 


(111) The many-world interpretation, introduced in 1957 by Hugh Everitt and popularized in the 
1960s and 1970s by Bruce de Witt. In this interpretation, a// possible measurement outcomes do happen, 
splitting the Universe into the corresponding number of “parallel multiverses”’, so that from one of them, 
other multiverses and hence other outcomes cannot be observed. Let me leave to the reader an estimate 
of the rate at which the parallel multiverses have to be constantly generated (say, per second), taking 
into account that such generation should take place not only at explicit lab experiments but at every 
irreversible process — such as fission of every atomic nucleus or an absorption/emission of every photon, 
everywhere in each multiverse — whether its result is formally recorded or not. Nicolaas van Kampen 
has called this a “mind-boggling fantasy”.37 Even the main proponent of this interpretation, B. de Witt 
has confessed: “The idea is not easy to reconcile with common sense.” I agree. 


(iv) Quantum logic. In desperation, some physicists turned philosophers have decided to dismiss 
the formal logic we are using — in science and elsewhere. From what (admittedly, very little) I have read 
about this school of thought, it seems that from its point of view, definite statements like “the SG 
detector has found the spin to be directed along the magnetic field” should not necessarily be either true 
or false. OK, if we dismiss the formal logic, I do not know how we can use any scientific theory to make 
any predictions — until the quantum logic experts tell us what to replace it with. To the best of my 
knowledge, so far they have not done that. I personally trust the opinion by J. Bell, who certainly gave 
more thought to these issues: “It is my impression that the whole vast subject of Quantum Logic has 
arisen [...] from the misuse of a word.” 


As far as I know, neither of these interpretations has yet provided a suggestion on how it might 
be tested experimentally to exclude other ones. On the positive side, there is a virtual consensus that 
quantum mechanics makes correct (if sometimes probabilistic) predictions, which do not contradict any 
reliable experimental results we are aware of. Maybe, this is not that bad for a scientific theory.** 


35D. Bohm, Phys. Rev. 85, 165; 180 (1952). 

36 See, e.g., Sec. 22.19 of his (generally very good) textbook D. Bohm, Quantum Theory, Dover, 1979. 

37 N. van Kampen, Physica A 153, 97 (1988). By the way, I highly recommend the very reasonable summary of 
the quantum measurement issues, given in this paper, though believe that the quantitative theory of dephasing, 
discussed in Chapter 7 of this course, might give additional clarity to some of van Kampen’s statements. 

38 For the reader who is not satisfied with this “positivistic” approach, and wants to improve the situation, my 
earnest advice is to start not from square one, but from reading what other (including some very clever!) people 
thought about it. The review collection by J. Wheeler and W. Zurek, cited above, may be a good starting point. 
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Chapter 1. Review of Thermodynamics 


This chapter starts with a brief discussion of the subject of statistical physics and thermodynamics, and 
the relation between these two disciplines. Then I proceed to a review of the basic notions and relations 
of thermodynamics. Most of this material is supposed to be known to the reader from their 
undergraduate studies,! so the discussion is rather brief. 


1.1. Introduction: Statistical physics and thermodynamics 


Statistical physics (alternatively called “statistical mechanics”) and thermodynamics are two 
different but related approaches to the same goal: an approximate description of the “internal’? 
properties of large physical systems, notably those consisting of N >> 1 identical particles — or other 
components. The traditional example of such a system is a human-scale portion of gas, with the number 
N of atoms/molecules? of the order of the Avogadro number Na ~ 107° (see Sec. 4 below). 


The motivation for the statistical approach to such systems is straightforward: even if the laws 
governing the dynamics of each particle and their interactions were exactly known, and we had infinite 
computing resources at our disposal, calculating the exact evolution of the system in time would be 
impossible, at least because it is completely impracticable to measure the exact initial state of each 
component — in the classical case, the initial position and velocity of each particle. The situation is 
further exacerbated by the phenomena of chaos and turbulence,* and the quantum-mechanical 
uncertainty, which do not allow the exact calculation of final positions and velocities of the component 
particles even if their initial state is known with the best possible precision. As a result, in most 
situations, only statistical predictions about the behavior of such systems may be made, with the 
probability theory becoming a major tool of the mathematical arsenal. 


However, the statistical approach is not as bad as it may look. Indeed, it is almost self-evident 
that any measurable macroscopic variable characterizing a stationary system of N >> 1 particles as a 
whole (think, e.g., about the stationary pressure P of the gas contained in a fixed volume JV) is almost 
constant in time. Indeed, as we will see below, besides certain exotic exceptions, the relative magnitude 
of fluctuations — either in time, or among many macroscopically similar systems — of such a variable is 
of the order of 1/N'”, and for N ~ Na is extremely small. As a result, the average values of appropriate 
macroscopic variables may characterize the state of the system quite well — satisfactory for nearly all 
practical purposes. The calculation of relations between such average values is the only task of 
thermodynamics and the main task of statistical physics. (Fluctuations may be important, but due to 
their smallness, in most cases their analysis may be based on perturbative approaches — see Chapter 5.) 


! For remedial reading, I can recommend, for example (in the alphabetical order): C. Kittel and H. Kroemer, 
Thermal Physics, 2"! ed., W. H. Freeman (1980); F. Reif, Fundamentals of Statistical and Thermal Physics, 
Waveland (2008); D. V. Schroeder, Introduction to Thermal Physics, Addison Wesley (1999). 

? Here “internal” is an (admittedly loose) term meaning all the physics unrelated to the motion of the system as a 
whole. The most important example of internal dynamics is the thermal motion of atoms and molecules. 

3 This is perhaps my best chance for a reverent mention of Democritus (circa 460-370 BC) — the Ancient Greek 
genius who was apparently the first one to conjecture the atomic structure of matter. 

4 See, e.g., CM Chapters 8 and 9. 
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Now let us have a fast look at the typical macroscopic variables the statistical physics and 
thermodynamics should operate with. Since I have already mentioned pressure P and volume JV, let me 
start with this famous pair of variables. First of all, note that volume is an extensive variable, i.e. a 
variable whose value for a system consisting of several non-interacting parts is the sum of those of its 
parts. On the other hand, pressure is an example of an intensive variable whose value is the same for 
different parts of a system — if they are in equilibrium. To understand why P and V form a natural pair of 
variables, let us consider the classical playground of thermodynamics, a portion of a gas contained in a 
cylinder, closed with a movable piston of area A (Fig. 1). 


Fig. 1.1. Compressing gas. 


Neglecting the friction between the walls and the piston, and assuming that it is being moved so 
slowly that the pressure P is virtually the same for all parts of the volume at any instant, the elementary 
work of the external force “ = PA, compressing the gas, at a small piston displacement dx = —dV/A, is 


dW = Fix = [= \ ace =—PdV. (1.1) 


Of course, the last expression is more general than the model shown in Fig. 1, and does not depend on 
the particular shape of the system’s surface.> (Note that in the notation of Eq. (1), which will be used 
through the course, the elementary work done by the gas on the external system equals —d 7”) 


From the point of analytical mechanics,® V and (—P) is just one of many possible canonical pairs 
of generalized coordinates q; and generalized forces 4, whose products d= Fdq; give independent 
contributions to the total work of the environment on the system under analysis. For example, the reader 
familiar with the basics of electrostatics knows that if the spatial distribution &(r) of an external electric 
field does not depend on the electric polarization Ar) of a dielectric medium placed into the field, its 
elementary work on the medium is 


dW =|E(r)-dA(r)d*r = | vé (r)dPA(r)d*r. (1.2a) 


The most important cases when this condition is fulfilled (and hence Eq. (2a) is valid) are, first, long 
cylindrical samples in a parallel external field (see, e.g., EM Fig. 3.13) and, second, the polarization of a 
sample (of any shape) due to that of discrete electric dipoles A;, whose electric interaction is negligible. 
In the latter case, Eq. (2a) may be also rewritten as the sum over the single dipoles, located at points rj: 7 


5 In order to prove that, it is sufficient to integrate the scalar product dW = dF - dr, with d¥ = —nPd’r, where dr 
is the surface displacement vector (see, e.g., CM Sec. 7.1), and n is the outer normal, over the surface. 

6 See, e.g., CM Chapters 2 and 10. 

7 Some of my students needed an effort to reconcile the positive signs in Eqs. (2) with the negative sign in the 
well-known relation dU; = —&(r,)d~x for the potential energy of a dipole in an external electric field — see, e.g., 
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dW=S dM, with d% =E(r,)- df, . (1.2b) 
k 


Very similarly, and at the similar conditions on the external magnetic field Mr), its elementary work on 
a magnetic medium may be also represented in either of two forms:8 


3 
dW = p1,| (vr) dM (r)d*r = py [> H,(r)d.u,(r)d*r, (1.3a) 
j=l 
dW=Y dH, with d% = 1, H(r,)-dm,. (1.3b) 
k 


where @ and mare the vectors of, respectively, the medium’s magnetization and the magnetic moment 
of a single dipole. Formulas (2) and (3) show that the roles of generalized coordinates may be played by 
Cartesian components of the vectors F (or ~#) and —@ (or m), with the components of the electric and 
magnetic fields playing the roles of the corresponding generalized forces. This list may be extended to 
other interactions (such as gravitation, surface tension in fluids, etc.). Following tradition, I will use the 
{-P, V } pair in almost all the formulas below, but the reader should remember that they all are valid for 
any other pair { 4, g;}.° 


Again, the specific relations between the variables of each pair listed above may depend on the 
statistical properties of the system under analysis, but their definitions are not based on statistics. The 
situation is very different for a very specific pair of variables, temperature T and entropy S, although 
these “sister variables” participate in many formulas of thermodynamics exactly as if they were just one 
more canonical pair {.4, gj}. However, the very existence of these two notions is due to statistics. 
Namely, temperature 7'is an intensive variable that characterizes the degree of thermal “agitation” of the 
system’s components. On the contrary, the entropy S is an extensive variable that in most cases evades 
immediate perception by human senses; it is a qualitative measure of the disorder of the system, i.e. the 
degree of our ignorance about its exact microscopic state. !° 


The reason for the appearance of the {7, S} pair of variables in formulas of thermodynamics and 
statistical mechanics is that the statistical approach to large systems of particles brings some 
qualitatively new results, most notably the notion of the irreversible time evolution of collective 
(macroscopic) variables describing the system. On one hand, the irreversibility looks absolutely natural 
in such phenomena as the diffusion of an ink drop in a glass of water. In the beginning, the ink 
molecules are located in a certain small part of the system’s volume, i.e. to some extent ordered, while at 
the late stages of diffusion, the position of each molecule in the glass is essentially random. However, as 
a second thought, the irreversibility is rather surprising, taking into account that the laws governing the 


EM Eggs. (3.15). The resolution of this paradox is simple: each term of Eq. (2b) describes the work d7, of the 
electric field on the internal degrees of freedom of the k" dipole, changing its internal energy Ex: dE; = d%;. This 
energy change may be viewed as coming from the dipole’s potential energy in the field: dE, = —dU,. 

8 Here, as in all my series, I am using the SI units; for their translation to the Gaussian units, I have to refer the 
reader to the EM part of the series. 

° Note that in systems of discrete particles, most generalized forces, including the fields € and & differ from the 
pressure P in the sense that their work may be explicitly partitioned into single-particle components — see Eqs. 
(2b) and (3b). This fact gives some discretion for the calculations based on thermodynamic potentials — see Sec.4. 
10 The notion of entropy was introduced into thermodynamics in 1865 by Rudolf Julius Emanuel Clausius on a 
purely phenomenological basis. In the absence of a clue about the entropy’s microscopic origin (which had to 
wait for the works by L. Boltzmann and J. Maxwell), this was an amazing intellectual achievement. 
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motion of the system’s components are time-reversible — such as the Newton laws or the basic laws of 
quantum mechanics.!! Indeed, if at a late stage of the diffusion process, we reversed the velocities of all 
molecules exactly and simultaneously, the ink molecules would again gather (for a moment) into the 
original spot.!2 The problem is that getting the information necessary for the exact velocity reversal is 
not practicable. This example shows a deep connection between statistical mechanics and information 
theory. 


A qualitative discussion of the reversibility-irreversibility dilemma requires a strict definition of 
the basic notion of statistical mechanics (and indeed of the probability theory), the statistical ensemble, 
and I would like to postpone it until the beginning of Chapter 2. In particular, in that chapter, we will see 
that the basic law of irreversible behavior is an increase of the entropy S in any closed system. Thus, the 
statistical mechanics, without defying the “microscopic” laws governing the evolution of system’s 
components, introduces on top of them some new “macroscopic” laws, intrinsically related to the 
evolution of information, i.e. the degree of our knowledge of the microscopic state of the system. 


To conclude this brief discussion of variables, let me mention that as in all fields of physics, a 
very special role in statistical mechanics is played by the energy E. To emphasize the commitment to 
disregard the motion of the system as a whole in this subfield of physics, the E considered in 
thermodynamics it is frequently called the internal energy, though just for brevity, I will skip this 
adjective in most cases. The simplest example of such E is the sum of kinetic energies of molecules in a 
dilute gas at their thermal motion, but in general, the internal energy also includes not only the 
individual energies of the system’s components but also their interactions with each other. Besides a few 
“pathological” cases of very-long-range interactions, these interactions may be treated as local; in this 
case the internal energy is proportional to WN, i.e. is an extensive variable. As will be shown below, other 
extensive variables with the dimension of energy are often very useful as well, including the 
(Helmholtz) free energy F, the Gibbs energy G, the enthalpy H, and the grand potential Q. (The 
collective name for such variables is thermodynamic potentials.) 


Now, we are ready for a brief discussion of the relationship between statistical physics and 
thermodynamics. While the task of statistical physics is to calculate the macroscopic variables discussed 
above!? for various microscopic models of the system, the main role of thermodynamics is to derive 
some general relations between the average values of the macroscopic variables (also called 
thermodynamic variables) that do not depend on specific models. Surprisingly, it is possible to 
accomplish such a feat using just a few either evident or very plausible general assumptions (sometimes 
called the Jaws of thermodynamics), which find their proof in statistical physics.!+ Such general relations 
allow for a substantial reduction of the number of calculations we have to do in statistical physics: in 
most cases, it is sufficient to calculate from the statistics just one or two variables, and then use general 


'l Because of that, the possibility of the irreversible macroscopic behavior of microscopically reversible systems 
was questioned by some serious scientists as recently as in the late 19" century — notably by J. Loschmidt in 1876. 
12 While quantum-mechanical effects, with their intrinsic uncertainty, may be quantitatively important in this 
example, our qualitative discussion does not depend on them. Another classical example is the chaotic motion of a 
ball on a 2D Sinai billiard — see CM Chapter 9 and in particular Fig. 9.8 and its discussion. 

13 Several other quantities, for example the heat capacity C, may be calculated as partial derivatives of the basic 
variables discussed above. Also, at certain conditions, the number of particles N in a certain system may be not 
fixed and also considered as an (extensive) variable — see Sec. 5 below. 

14 Admittedly, some of these proofs are based on other plausible but deeper postulates, for example the central 
statistical hypothesis (Sec. 2.2), whose best proof, to my knowledge, is just the whole body of experimental data. 
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thermodynamic relations to get all other properties of interest. Thus the thermodynamics, sometimes 
snubbed as a phenomenology, deserves every respect not only as a useful theoretical tool but also as a 
discipline more general than any particular statistical model. This is why the balance of this chapter is 
devoted to a brief review of thermodynamics. 


1.2. The 2" law of thermodynamics, entropy, and temperature 


Thermodynamics accepts a phenomenological approach to the entropy S, postulating that there is 
such a unique extensive measure of the aggregate disorder, and that in a closed system (defined as a 
system completely isolated from its environment, i.e. the system with its internal energy fixed) it may 
only grow in time, reaching its constant (maximum) value at equilibrium: !> 


This postulate is called the 2” law of thermodynamics — arguably its only substantial new law.!6 


Rather surprisingly, this law, together with the additivity of S in composite systems of non- 
interacting parts (as an extensive variable), is sufficient for a formal definition of temperature, and a 
derivation of its basic properties that comply with our everyday notion of this key variable. Indeed, let 
us consider a closed system consisting of two fixed-volume subsystems (Fig. 2) whose internal 
relaxation is very fast in comparison with the rate of the thermal flow (i.e. the energy and entropy 
exchange) between the parts. In this case, on the latter time scale, each part is always in some quasi- 
equilibrium state, which may be described by a unique relation E(S) between its energy and entropy.!7 


Fig. 1.2. A. composite thermodynamic system. 


Neglecting the energy of interaction between the parts (which is always possible at N >> 1, and 
in the absence of long-range interactions), we may use the extensive character of the variables FE and S 
to write 


E=E,(S,)+£,(S,), S=S,+8, (1.5) 


for the full energy and entropy of the system. Now let us use them to calculate the following derivative: 


'5 [mplicitly, this statement also postulates the existence, in a closed system, of thermodynamic equilibrium, an 
asymptotically reached state in which all macroscopic variables, including entropy, remain constant. Sometimes 
this postulate is called the 0” law of thermodynamics. 

16 Two initial formulations of this law, later proved equivalent, were put forward independently by Lord Kelvin 
(born William Thomson) in 1851 and by Rudolf Clausius in 1854. 

17 Here we strongly depend on a very important (and possibly the least intuitive) aspect of the 2" law, namely that 
the entropy is a unique measure of disorder. 
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dS _ dS, dS, _ dS, | dS, dE, _ dS, | dS, d(E-E)) 


=—1y4 ee = (1.6) 
dE, dE, dE, dE, dE,dE, dE, dE, dE, 


Since the total energy E of the closed system is fixed and hence independent of its re-distribution 
between the subsystems, we have to take dE/dE, =0, and Eq. (6) yields 
dS _dS,_ dS, 
dE, dE, dE, 
According to the 2™ law of thermodynamics, when the two parts have reached the thermodynamic 
equilibrium, the total entropy S reaches its maximum, so that dS/dE, = 0, and Eq. (7) yields 
dS, _ dS, 
dE, dE, 
This equality shows that if a thermodynamic system may be partitioned into weakly interacting 
macroscopic parts, their derivatives dS/dE should be equal in the equilibrium. The reciprocal of this 


derivative is called temperature. Taking into account that our analysis pertains to the situation (Fig. 2) 
when both volumes 2 are fixed, we may write this definition as 


(1.7) 


(1.8) 


(1.9) 


the subscript V meaning that volume is kept constant at the differentiation. (Such notation is common 
and very useful in thermodynamics, with its broad range of variables.) 


Note that according to Eq. (9), if the temperature is measured in energy units!8 (as I will do in 
this course for the brevity of notation), then S is dimensionless. The transfer to the SI or Gaussian units, 
i.e. to the temperature 7x measured in kelvins (not “Kelvins”, and not “degrees Kelvin’, please!), is 
given by the relation 7 = kp7x, where the Boltzmann constant kg ~ 1.3810? J/K = 1.38x10°'° erg/K.19 
In those units, the entropy becomes dimensional: Sx = kpS. 


The definition of temperature, given by Eq. (9), is of course in sharp contrast with the popular 
notion of TJ as a measure of the average energy of one particle. However, as we will repeatedly see 
below, in many cases these two notions may be reconciled, with Eq. (9) being more general. In 
particular, the so-defined 7 is in semi-quantitative agreement with our everyday notion of temperature:?° 


(i) according to Eq. (9), the temperature is an intensive variable (since both E and S are 
extensive), 1.e., in a system of similar particles, it is independent of the particle number N; 


18 Here I have to mention a traditional unit of thermal energy, the calorie, still being used in some applied fields. 
In the most common modern definition (as the so-called thermochemical calorie) it equals exactly 4.148 J. 

'9 For the more exact values of this and other constants, see appendix CA: Selected Physical Constants. Note that 
both 7 and 7x define the natural absolute (also called “thermodynamic”’) scale of temperature, vanishing at the 
same point — in contrast to such artificial scales as the degrees Celsius (“centigrades”), defined as Tc = Tx + 
273.15, or the degrees Fahrenheit: Tp = (9/5)T¢ + 32. 

20 Historically, such notion was initially qualitative — just as something distinguishing “hot” from “cold”. After 
the invention of thermometers (the first one by Galileo Galilei in 1592), mostly based on thermal expansion of 
fluids, this notion had become quantitative but not very deep: being understood as something “what the 
thermometer measures” — until its physical sense as a measure of thermal motion’s intensity, was revealed in the 
19" century. 
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(11) temperatures of all parts of a system are equal at equilibrium — see Eq. (8); 
(111) in a closed system whose parts are not in equilibrium, thermal energy (heat) always flows 
from a warmer part (with higher 7) to the colder part. 


In order to prove the last property, let us revisit the closed, composite system shown in Fig. 2, 
and consider another derivative: 


dS _dS, dS, _ dS, dE, , dS, dE, 


= , (1.10) 
dt dt dt dE, dt dE, dt 


If the internal state of each part is very close to equilibrium (as was assumed from the very beginning) at 
each moment of time, we can use Eq. (9) to replace the derivatives dS\ 2/dE),2 with 1/7; 2, getting 
dS 1 dE) 1 dE, 
dt Edt ae 


(1.11) 


Since in a closed system E = EF; + E> = const, these time derivatives are related as dEy/dt = —dE/dt, and 
Eq. (11) yields 


Ey 
dS jl 1 d. L (1.12) 
dt \7, T,}dt 
But according to the 2" law of thermodynamics, this derivative cannot be negative: dS/dt > 0. Hence, 
IE 
ache GEG (1.13) 
he Da 


For example, if 7; > T>, then dE;/dt < 0, i.e. the warmer part gives energy to its colder counterpart. 


Note also that at such a heat exchange, at fixed volumes V;2, and 7; # 7», increases the total 
system’s entropy, without performing any “useful” mechanical work — see Eq. (1). 


1.3. The 1° and 3 laws of thermodynamics, and heat capacity 


Now let us consider a thermally insulated system whose volume V may be changed by force — 
see, for example, Fig. 1. Such a system is different from the fully closed one, because its energy E may 
be changed by the external force’s work — see Eq. (1): 


dE =dW =—PdV . (1.14) 


Let the volume change be so slow (dV/dt — 0) that the system is virtually at equilibrium at any instant. 
Such a slow process is called reversible, and in the particular case of a thermally insulated system, it is 
also called adiabatic. If the pressure P (or any generalized external force ¥;) is deterministic, i.e. is a 
predetermined function of time, independent of the state of the system under analysis, it may be 
considered as coming from a fully ordered system, i.e. the one having zero entropy, with the total system 
(the system under our analysis plus the source of the force) completely closed. Since the entropy of the 
total closed system should stay constant (see the second of Eqs. (5) above), S of the system under 
analysis should stay constant on its own. Thus we arrive at a very important conclusion: at an adiabatic 
process, the entropy of a system cannot change. (Sometimes such a process is called isentropic.) This 
means that we may use Eq. (14) to write 
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aE 
p-{) . (1.15) 


Now let us consider a more general thermodynamic system that may also exchange thermal 
energy (“heat”) with its environment (Fig. 3). 


dw 
<<—_ i 
Fig. 1.3. An example of the thermodynamic 


process involving both the mechanical work by 
the environment, and the heat exchange with it. 


For such a system, our previous conclusion about the entropy’s constancy is not valid, so that S, 
in equilibrium, may be a function of not only the system’s energy £, but also of its volume: S' = S(E, V). 
Let us consider this relation resolved for energy: E = E(S, V), and write the general mathematical 
expression for the full differential of E as a function of these two independent arguments: 


ae -(=| as +) dV. (1.16) 
as), av J, 


This formula, based on the stationary relation EF = E(S, V), is evidently valid not only in equilibrium but 
also for all very slow, reversible?! processes. Now, using Eqs. (9) and (15), we may rewrite Eq. (16) as 


dE =TdS —PdV . (109) Gee aes 


According to Eq. (1), the second term on the right-hand side of this equation is just the work of the 
external force, so that due to the conservation of energy,” the first term has to be equal to the heat dQ 
transferred from the environment to the system (see Fig. 3): 


dE =dQ+dw, (1.18) 4%law of 
dO =TdS.. (G10). ae 


The last relation, divided by 7 and then integrated along an arbitrary (but reversible!) process, 
dQ 
S = | +const, 1.20 
Ie (1.20) 


is sometimes used as an alternative definition of entropy S — provided that temperature is defined not by 
Eq. (9), but in some independent way. It is useful to recognize that entropy (like energy) may be defined 


21 Let me emphasize again that any adiabatic process is reversible, but not vice versa. 

22 Such conservation, expressed by Eqs. (18)-(19), is commonly called the 7” Jaw of thermodynamics. While it (in 
contrast with the 2" law) does not present any new law of nature, and in particular was already used de-facto to 
write the first of Eqs. (5) and also Eq. (14), such a grand name was absolutely justified in the 19" century when 
the mechanical nature of the internal energy (the thermal motion) was not at all clear. In this context, the names of 
three scientists, Benjamin Thompson (who gave, in 1799, convincing arguments that heat cannot be anything but 
a form of particle motion), Julius Robert von Mayer (who conjectured the conservation of the sum of the thermal 
and macroscopic mechanical energies in 1841), and James Prescott Joule (who proved this conservation 
experimentally two years later), have to be reverently mentioned. 


Chapter 1 Page 8 of 24 


Heat 
capacity: 
definitions 


Essential Graduate Physics SM: Statistical Mechanics 


to an arbitrary constant, which does not affect any other thermodynamic observables. The common 
convention is to take 
S>0, at T0. (1.21) 


This condition is sometimes called the “3™ law of thermodynamics”, but it is important to realize that 
this is just a convention rather than a real law.?3 Indeed, the convention corresponds well to the notion of 
the full order at T = 0 in some systems (e.g., separate atoms or perfect crystals), but creates ambiguity 
for other systems, e.g., amorphous solids (like the usual glasses) that may remain highly disordered for 
“astronomic” times, even at T—> 0. 


Now let us discuss the notion of heat capacity that, by definition, is the ratio dO/dT, where dQ is 
the amount of heat that should be given to a system to raise its temperature by a small amount dT. 74 
(This notion is important because the heat capacity may be most readily measured experimentally.) The 
heat capacity depends, naturally, on whether the heat dO goes only into an increase of the internal 
energy dE of the system (as it does if its volume V is constant), or also into the mechanical work (-dW/) 
performed by the system at its expansion — as it happens, for example, if the pressure P, rather than the 
volume J, is fixed (the so-called isobaric process — see Fig. 4). 


Fig. 1.4. The simplest example of 
the isobaric process. 


(1.22) 


(1.23) 


23Actually, the 3™ law (also called the Nernst theorem) as postulated by Walter Hermann Nernst in 1912 was 
different — and really meaningful: “It is impossible for any procedure to lead to the isotherm T= 0 in a finite 
number of steps.” I will discuss this theorem at the end of Sec. 6. 

24 By this definition, the full heat capacity of a system is an extensive variable, but it may be used to form such 
intensive variables as the heat capacity per particle, called the specific heat capacity, or just the specific heat. 
(Please note that the last terms are rather ambiguous: they are used for the heat capacity per unit mass, per unit 
volume, and sometimes even for the heat capacity of the system as the whole, so that some caution is in order.) 

25 Dividing both sides of Eq. (19) by dT, we get the general relation dQ/dT = TdS/dT, which may be used to 
rewrite the definitions (22) and (23) in the following forms: 


=e) 
OT}, oT)» 


more convenient for some applications. 
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and expect that for all “normal” (mechanically stable) systems, Cp = Cy. The difference between Cp and 
Cy is rather minor for most liquids and solids, but may be very substantial for gases — see Sec. 4. 


1.4. Thermodynamic potentials 


Since for a fixed volume, d7V = —PdV = 0, and Eq. (18) yields dO = dE, we may rewrite Eq. 
(22) in another convenient form 
IE 
C, -(3 : (1.24) 
V 


ar 


so that to calculate C; from a certain statistical-physics model, we only need to calculate F as a function 
of temperature and volume. If we want to obtain a similarly convenient expression for Cp, the best way 
is to introduce a new notion of so-called thermodynamic potentials — whose introduction and effective 
use is perhaps one of the most impressive techniques of thermodynamics. For that, let us combine Eqs. 
(1) and (18) to write the 1“ law of thermodynamics in its most common form 


dQ =dE+Pdv. (1.25) 
At an isobaric process (Fig. 4), 1.e. at P = const, this expression is reduced to 
(dQ), =dE, +d(PV), =d(E+PV)>. (1.26) 


Thus, if we introduce a new function with the dimensionality of energy:2° 


H=E+PY, day 


called enthalpy (or, sometimes, the “heat function” or the “heat contents”’),*” we may rewrite Eq. (23) as 
Ce -(=) (1.28) 
OT )p 
Comparing Eqs. (28) and (24) we see that for the heat capacity, the enthalpy H plays the same role at 
fixed pressure as the internal energy F plays at fixed volume. 


Now let us explore properties of the enthalpy at an arbitrary reversible process, 1.e. lifting the 
restriction P = const, but keeping the definition (27). Differentiating this equality, we get 


dH =dE + PdV +VaP. (1.29) 


Plugging into this relation Eq. (17) for dE, we see that the terms +PdV cancel, yielding a very simple 


expression 
dH =TdS+VdP, (1.30) 


whose right-hand side differs from Eq. (17) only by the swap of P and V in the second term, with the 
simultaneous change of its sign. Formula (30) shows that if H has been found (say, experimentally 


26 From the point of view of mathematics, Eq. (27) is a particular case of the so-called Legendre transformations. 
27 This function (as well as the Gibbs free energy G, see below), had been introduced in 1875 by J. Gibbs, though 
the term “enthalpy” was coined (much later) by H. Onnes. 
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measured or calculated for a certain microscopic model) as a function of the entropy S and the pressure 
P of a system, we can calculate its temperature 7 and volume V by simple partial differentiation: 


r-() . y-(%) . (1.31) 
as), aP J. 


The comparison of the first of these relations with Eq. (9) shows that not only for the heat capacity but 
for temperature as well, enthalpy plays the same role at fixed pressure, as played by internal energy at 
fixed volume. 


This success immediately raises the question of whether we could develop this idea further on, 
by defining other useful thermodynamic potentials — the variables with the dimensionality of energy that 
would have similar properties — first of all, a potential that would enable a similar swap of 7 and S in its 
full differential, in comparison with Eq. (30). We already know that an adiabatic process is the 
reversible process with fixed entropy, inviting analysis of a reversible process with fixed temperature. 
Such an isothermal process may be implemented, for example, by placing the system under 
consideration into thermal contact with a much larger system (called either the heat bath, or “heat 
reservoir’, or “thermostat”) that remains in thermodynamic equilibrium at all times — see Fig. 5. 


heat bath 


T 
Fig. 1.5. The simplest 


dQ example of the isothermal 
process. 


Due to its very large size, the heat bath temperature T does not depend on what is being done 
with our system, and if the change is being done sufficiently slowly (i.e. reversibly), that this 
temperature is also the temperature of our system — see Eq. (8) and its discussion. Let us calculate the 
elementary mechanical work d7/ (1) at such a reversible isothermal process. According to the general 
Eq. (18), dW = dE — dO. Plugging dQ from Eq. (19) into this equality, for 7 = const we get 


(dW), = dE —-TdS = d(E -TS) =dF, (1.32) 


is called the free energy (or the “Helmholtz free energy”, or just the “Helmholtz energy’’”®). Just as we 
have done for the enthalpy, let us establish properties of this new thermodynamic potential for an 
arbitrarily small, reversible (now not necessarily isothermal!) variation of variables, while keeping the 
definition (33). Differentiating this relation and then using Eq. (17), we get 


dF =—SdT — PdV. (1.34) 


28 It was named after Hermann von Helmholtz (1821-1894). The last of the listed terms for F was recommended 
by the most recent (1988) IUPAC’s decision, but I will use the first term, which prevails is physics literature. The 
origin of the adjective “free” stems from Eq. (32): F is may be interpreted as the internal energy’s part that is 
“free” to be transferred to the mechanical work, at the (most common) reversible, isothermal process. 


where the following combination, 
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Thus, if we know the function F(Z, V), we can calculate S and P by simple differentiation: 


s-{=) p-{=) . (1.35) 
aT }, av J, 


Now we may notice that the system of all partial derivatives may be made full and symmetric if 
we introduce one more thermodynamic potential. Indeed, we have already seen that each of the three 
already introduced thermodynamic potentials (EF, H, and F) has an especially simple full differential if it 
is considered as a function of its two canonical arguments: one of the “thermal variables” (either S or T) 
and one of the “mechanical variables” (either P or V):29 


E=E(S,V); H=H(S,P);  F=F(T,V). (1.36) 


In this list of pairs of four arguments, only one pair is missing: {7, P}. The thermodynamic function of 
this pair, which gives the two remaining variables (S and V) by simple differentiation, is called the 
Gibbs energy (or sometimes the “Gibbs free energy”): G = G(T, P). The way to define it in a symmetric 
way is evident from the so-called circular diagram shown in Fig. 6. 


(b) 


Fig. 1.6. (a) The circular diagram and 
(b) an example of its use for variable 
calculation. The thermodynamic 
potentials are typeset in red, each 
flanked with its two canonical 
arguments. 


In this diagram, each thermodynamic potential is placed between its two canonical arguments — 
see Eq. (36). The left two arrows in Fig. 6a show the way the potentials H and F' have been obtained 
from energy FE — see Eqs. (27) and (33). This diagram hints that G has to be defined as shown by either 
of the two right arrows on that panel, i.e. as 


G=F+PV=H-TS=E-TS+PV. (1.37) oe 


In order to verify this idea, let us calculate the full differential of this new thermodynamic potential, 
using, e.g., the first form of Eq. (37) together with Eq. (34): 


dG = dF +d(PV) =(-SdT — PdV) +(PdV +VdP) =—SdT +VaP, iss: 2. 


so that if we know the function G(T, P), we can indeed readily calculate both entropy and volume: 
S= {2) - eS (=) (1.39) 
OT )p OP ), 


29 Note the similarity of this situation with that is analytical mechanics (see, e.g., CM Chapters 2 and 10): the 
Lagrangian function may be used to derive the equations of motion if it is expressed as a function of generalized 
coordinates and their velocities, while to use the Hamiltonian function in a similar way, it has to be expressed as a 
function of the generalized coordinates and the corresponding momenta. 
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The circular diagram completed in this way is a good mnemonic tool for describing Eqs. (9), 
(15), (31), (35), and (39), which express thermodynamic variables as partial derivatives of the 
thermodynamic potentials. Indeed, the variable in any corner of the diagram may be found as a partial 
derivative of any of two potentials that are not its immediate neighbors, over the variable in the opposite 
corner. For example, the green line in Fig. 6b corresponds to the second of Eqs. (39), while the blue line, 
to the second of Eqs. (31). At this procedure, all the derivatives giving the variables of the upper row (S 
and P) have to be taken with negative signs, while those giving the variables of the bottom row (V and 
T), with positive signs.3° 


Now I have to justify the collective name “thermodynamic potentials” used for E, H, F, and G. 
For that, let us consider an irreversible process, for example, a direct thermal contact of two bodies with 
different initial temperatures. As was discussed in Sec. 2, at such a process, the entropy may grow even 
without the external heat flow: dS = 0 at dO = 0 — see Eq. (12). This means that at a more general 
process with dQ # 0, the entropy may grow faster than predicted by Eq. (19), which has been derived 
for a reversible process, so that 


dS = dQ (1.40) 
T 

with the equality approached in the reversible limit. Plugging Eq. (40) into Eq. (18) (which, being just 

the energy conservation law, remains valid for irreversible processes as well), we get 


dE <TdS — Pav. (1.41) 


We can use this relation to have a look at the behavior of other thermodynamic potentials in 
irreversible situations, still keeping their definitions given by Eqs. (27), (33), and (37). Let us start from 
the (very common) case when both the temperature T and the volume V of a system are kept constant. If 
the process is reversible, then according to Eq. (34), the full time derivative of the free energy F would 
equal zero. Eq. (41) says that at an irreversible process, this is not necessarily so: if dT = dV =0, then 

dF d _ dE dS 


pase | olen So 7 

a (E -TS), z; T es 0. (1.42) 
Hence, in the general (irreversible) situation, F can only decrease, but not increase in time. This means 
that F eventually approaches its minimum value F(7, S), given by the equations of reversible 
thermodynamics. To re-phrase this important conclusion, in the case T = const, V = const, the free 
energy F, i.e. the difference E — TS, plays the role of the potential energy in the classical mechanics of 
dissipative processes: its minimum corresponds to the (in the case of F, thermodynamic) equilibrium of 
the system. This is one of the key results of thermodynamics, and I invite the reader to give it some 
thought. One of its possible handwaving interpretations of this fact is that the heat bath with fixed 7 > 0, 
i.e. with a substantial thermal agitation of its components, “wants” to impose thermal disorder in the 
system immersed into it, by “rewarding” it with lower F for any increase of disorder. 


30 There is also a wealth of other relations between thermodynamic variables that may be represented as second 
derivatives of the thermodynamic potentials, including four Maxwell relations such as (OS/6V)r = (OP/OT)y, etc. 
(They may be readily recovered from the well-known property of a function of two independent arguments, say, 
lx, y): O(6ff0x)/dy = O(6fldy)/Ox.) In this chapter, I will list only the thermodynamic relations that will be used 
later in the course; a more complete list may be found, e.g., in Sec. 16 of the book by L. Landau and E. Lifshitz, 
Statistical Physics, Part 1, 3™ ed., Pergamon, 1980 (and its later re-printings). 
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Repeating the calculation for a different case, T= const, P = const, it is easy to see that in this 
case the same role is played by the Gibbs energy: 
dG _d dE dS dV dS dV dS dV 


a = peice 2a ( pceaceany - sly won pecan ay 2 acc y 6 YAS G6 
dt dt dt dt dt dt dt dt 


so that the thermal equilibrium now corresponds to the minimum of G rather than F. 


For the two remaining thermodynamic potentials, E and H, the calculations similar to Eqs. (42) 
and (43) make less sense because that would require keeping S = const (with V = const for E, and P = 
const for 7) for an irreversible process, but it is usually hard to prevent the entropy from growing if 
initially it had been lower than its equilibrium value, at least on the long-term basis.3! Thus the circular 
diagram is not so symmetric after all: G and F are somewhat more useful for most practical calculations 
than E and H. 


Note that the difference G — F = PV between the two “more useful” potentials has very little to 
do with thermodynamics at all because this difference exists (although is not much advertised) in 
classical mechanics as well.32 Indeed, the difference may be generalized as G — F = —Sq, where g is a 
generalized coordinate, and ¥ is the corresponding generalized force. The minimum of F corresponds 
to the equilibrium of an autonomous system (with 4= 0), while the equilibrium position of the same 
system under the action of external force Y is given by the minimum of G. Thus the external force 
“wants” the system to subdue to its effect, “rewarding” it with lower G. 


Moreover, the difference between F and G becomes a bit ambiguous (approach-dependent) when 
the product 4g may be partitioned into single-particle components — just as it is done in Eqs. (2b) and 
(3b) for the electric and magnetic fields. Here the applied field may be taken into account on the 
microscopic level, including its effect directly into the energy & of each particle. In this case, the field 
contributes to the total internal energy E directly, and hence the thermodynamic equilibrium (at 7 = 
const) is described as the minimum of F’. (We may say that in this case F' = G, unless a difference 
between these thermodynamic potentials is created by the actual mechanical pressure P.) However, in 
some cases, typically for condensed systems, with their strong interparticle interactions, the easier (and 
sometimes the only one practicable?3) way to account for the field is on the macroscopic level, taking G 
= F — ¥q. In this case, the same equilibrium state is described as the minimum of G. (Several examples 
of this dichotomy will be given later in this course.) Whatever the choice, one should mind not take the 
same field effect into account twice. 


3! There are a few practicable systems, notably including the so-called adiabatic magnetic refrigerators (to be 
discussed in Chapter 2), where the unintentional growth of S is so slow that the condition S = const may be 
closely approached. 

32 Tt is convenient to describe it as the difference between the “usual” (internal) potential energy U of the system 
to its “Gibbs potential energy” Ug — see CM Sec. 1.4. For the readers who skipped that discussion: my pet 
example is the usual elastic spring with U = kx’/2, under the effect of an external force % whose equilibrium 
position (x9 = Ak) evidently corresponds to the minimum of Ug = U— &, rather than just U. 

33 An example of such an extreme situation is the case when an external magnetic field W is applied to a 
superconductor in its so-called intermediate state, in which the sample partitions into domains of the “normal” 
phase with B = s#, and the superconducting phase with # = 0. In this case, the field is effectively applied to 
the interfaces between the domains, very similarly to the mechanical pressure applied to a gas portion via a piston 
— see Fig. | again. 
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One more important conceptual question I would like to discuss here is why usually statistical 
physics pursues the calculation of thermodynamic potentials, rather than just of a relation between P, V, 
and 7. (Such relation is called the equation of state of the system.) Let us explore this issue on the 
particular but important example of an ideal classical gas in thermodynamic equilibrium, for which the 
equation of state should be well known to the reader from undergraduate physics:34 


PV =NT, (1.44) 


where AN is the number of particles in volume V. (In Chapter 3, we will derive Eq. (44) from statistics.) 
Let us try to use it for the calculation of all thermodynamic potentials, and all other thermodynamic 
variables discussed above. We may start, for example, from the calculation of the free energy F’. Indeed, 
integrating the second of Eqs. (35) with the pressure calculated from Eq. (44), P = NT/V, we get 

dV dV/N) _ V 


=-NT{—=-NT| =-NT In + Nf(D), (1.45) 


F =-[Pav| 
V (VIN) 


T=const 
where V has been divided by N in both instances just to represent F as a manifestly extensive variable, in 
this uniform system proportional to NV. The integration “constant” /(7) is some function of temperature, 
which cannot be recovered from the equation of state. This function affects all other thermodynamic 
potentials, and the entropy as well. Indeed, using the first of Eqs. (35) together with Eq. (45), we get 


s-{=) =n 0), (1.46) 
oT ), N dT 


and now may combine Eqs. (33) with (46) to calculate the (internal) energy of the gas,*> 


_ ale sorte a vO). 7 al 
B=F+TS =| wrind + np(r)| 47] Nin Saar f(T)-T a (1.47) 


then use Eqs. (27), (44) and (47) to calculate its enthalpy, 


a= £+Py=£4NT=N s(t)-7 O47] (1.48) 
and, finally, plug Eqs. (44) and (45) into Eq. (37) to calculate the Gibbs energy 


G=F+PV=N-Tin s+ f(0)+7]. (1.49) 


34 The long history of the gradual discovery of this relation includes the very early (circa 1662) work by R. Boyle 
and R. Townely, followed by contributions from H. Power, E. Mariotte, J. Charles, J. Dalton, and J. Gay-Lussac. 
It was fully formulated by Benoit Paul Emile Clapeyron in 1834, in the form PV = nRTx, where n is the number of 
moles in the gas sample, and R ~ 8.31 J/mole-K is the so-called gas constant. This form is equivalent to Eq. (44), 
taking into account that R = kgNa, where Na = 6.022 140 76x10” mole’! is the Avogadro number, i.e. the number 
of molecules per mole. (By the mole’s definition, Nq is just the reciprocal mass, in grams, of the 1/12" part of the 
"C atom, which is close to the mass of one proton or neutron — see Appendix CA: Selected Physical Constants.) 
Historically, this equation of state was the main argument for the introduction of the absolute temperature 7, 
because only with it, the equation acquires the spectacularly simple form (44). 

35 Note that Eq. (47), in particular, describes a very important property of the ideal classical gas: its energy 
depends only on temperature (and the number of particles), but not on volume or pressure. 
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One might ask whether the function f(7) is physically significant, or it is something like the 
inconsequential, arbitrary constant — like the one that may be always added to the potential energy in 
non-relativistic mechanics. In order to address this issue, let us calculate, from Eqs. (24) and (28), both 
heat capacities, which are evidently measurable quantities: 


2 
C, = (=| = -nr is, (1.50) 
oT ), dT 


2 

c-(4| -(-r§ fri}ec, +m (1.51) 

OT ) > dT 

We see that the function (7), or at least its second derivative, is measurable.*° (In Chapter 3, we 
will calculate this function for two simple “microscopic” models of the ideal classical gas.) The meaning 
of this function is evident from the physical picture of the ideal gas: the pressure P exerted on the walls 
of the containing volume is produced only by the translational motion of the gas molecules, while their 
internal energy EF (and hence other thermodynamic potentials) may be also contributed by the internal 
dynamics of the molecules — their rotations, vibrations, etc. Thus, the equation of state does not give us 
the full thermodynamic description of a system, while the thermodynamic potentials do. 


1.5. Systems with a variable number of particles 


Now we have to consider one more important case: when the number N of particles in a system 
is not rigidly fixed, but may change as a result of a thermodynamic process. A typical example of such a 
system is a gas sample separated from the environment by a penetrable partition — see Fig. 7.37 


environment 


Fig. 1.7. An example of a system 
with a variable number of particles. 


Let us analyze this situation for the simplest case when all the particles are similar. (In Sec. 4.1, 
this analysis will be extended to systems with particles of several sorts). In this case, we may consider NV 
as an independent thermodynamic variable whose variation may change the energy E of the system, so 
that (for a slow, reversible process) Eq. (17) should be now generalized as 


dE =TdS — PaV + udN, (1.52) 


36 Note, however, that the difference Cp — Cy = N is independent of f(T). (If the temperature is measured in 
kelvins, this relation takes a more familiar form Cp—Cy = nR.) It is straightforward (and hence left for the reader’s 
exercise) to show that the difference Cp — Cy of any system is fully determined by its equation of state. 

37 Another important example is a gas in a contact with the open-surface liquid of similar molecules. 
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where sz is a new function of state, called the chemical potential.** Keeping the definitions of other 
thermodynamic potentials, given by Eqs. (27), (33), and (37), intact, we see that the expressions for their 
differentials should be generalized as 


dH =TdS +VdP + udN, (1.53a) 
dF =-SdT — PdV + dN, (1.53b) 
dG = —SdT +VdP + udN, (1.53c) 


so that the chemical potential may be calculated as either of the following partial derivatives:*? 


iE Jel PF 
ON SV ON S\P oN TV ON T,P 
Despite the formal similarity of all Eqs. (54), one of them is more consequential than the others. 
Indeed, the Gibbs energy G is the only thermodynamic potential that is a function of two intensive 


parameters, 7 and P. However, as all thermodynamic potentials, G has to be extensive, so that in a 
system of similar particles it has to be proportional to N: 


G=Ng, (1.55) 
where g is some function of 7 and P. Plugging this expression into the last of Eqs. (54), we see that u 
equals exactly this function, so that 


SS 1.56 
ae (1.56) 


i.e. the chemical potential is just the Gibbs energy per particle. 


In order to demonstrate how vital the notion of chemical potential may be, let us consider the 
situation (parallel to that shown in Fig. 2) when a system consists of two parts, with equal pressure and 
temperature, that can exchange particles at a relatively slow rate (much slower than the speed of the 
internal relaxation of each part). Then we can write two equations similar to Eqs. (5): 


N=N,+N,, G=G,+G,, (1.57) 
where N = const, and Eq. (56) may be used to describe each component of G: 
G=uN,+1N>- (1.58) 
Plugging the N2 expressed from the first of Eqs. (57), N2 = N— Ni, into Eq. (58), we see that 


Oo oe ap (1.59) 
dN, 

so that the minimum of G is achieved at 4 = 4b. Hence, in the conditions of fixed temperature and 

pressure, i.e. when G is the appropriate thermodynamic potential, the chemical potentials of the system 


parts should be equal — the so-called chemical equilibrium. 


38 This name, of a historic origin, is misleading: as evident from Eq. (52), 4 has a clear physical sense of the 
average energy cost of adding one more particle to the system of N >> 1 particles. 

39 Note that strictly speaking, Eqs. (9), (15), (31), (35). and (39) should be now generalized by adding another 
lower index, N, to the corresponding derivatives; I will just imply this. 
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Finally, later in the course, we will also run into several cases when the volume V of a system, its 
temperature 7, and the chemical potential w are all fixed. (The last condition may be readily 
implemented by allowing the system of our interest to exchange particles with an environment so large 
that its zz stays constant.) The thermodynamic potential appropriate for this case may be obtained by 
subtraction of the product wN from the free energy F, resulting in the so-called grand thermodynamic 
(or “Landau”) potential: 


Q=F-pN=F oNSF G=-PV. (1.60) 


Indeed, for a reversible process, the full differential of this potential is 


dQ = dF —d(uN) = (-SdT — PdV + wdN) —(udN + Ndu) =—SdT —PdV —Ndu, | (1.61) 


so that if Q has been calculated as a function of 7, V, and yw, other thermodynamic variables may be 


found as 
fs {2 p= {2 - We {2 . (1.62) 
oT Vu OV Tout Ou TV 


Now acting exactly as we have done for other potentials, it is straightforward to prove that an 
irreversible process with fixed T, V, and yw, provides dO/dt < 0, so that system’s equilibrium indeed 
corresponds to the minimum of the grand potential Q. We will repeatedly use this fact in this course. 


1.6. Thermal machines 


In order to complete this brief review of thermodynamics, I cannot completely pass the topic of 
thermal machines — not because it will be used much in this course, but mostly because of its practical 
and historic significance.4° Figure 8a shows the generic scheme of a thermal machine that may perform 
mechanical work on its environment (in our notation, equal to —7/) during each cycle of the 
expansion/compression of some “working gas”, by transferring different amounts of heat from a high- 
temperature heat bath (Qy) and to the low-temperature bath (Qr). 


(b) 


\. 


“working gas” 


0 V 
Fig. 1.8. (a) The simplest implementation of a thermal machine, and (b) the graphic representation of the 


mechanical work it performs. On panel (b), the solid arrow indicates the heat engine cycle direction, 
while the dashed arrow, the refrigerator cycle direction. 


40 The whole field of thermodynamics was spurred by the famous 1824 work by Nicolas Léonard Sadi Carnot, in 
which he, in particular, gave an alternative, indirect form of the 2" law of thermodynamics — see below. 
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One relation between the three amounts Quy, Or, and YW is immediately given by the energy 
conservation (i.e. by the 1“ law of thermodynamics): 


QO, -Q, =-. (1.63) 


From Eq. (1), the mechanical work during the cycle may be calculated as 
-W = Pav, (1.64) 


and hence represented by the area circumvented by the state-representing point on the [P, V] plane — see 
Fig. 8b. Note that the sign of this circular integral depends on the direction of the point’s rotation; in 
particular, the work (-7/) done by the working gas is positive at its clockwise rotation (pertinent to heat 
engines) and negative in the opposite case (implemented in refrigerators and heat pumps — see below). 
Evidently, the work depends on the exact form of the cycle, which in turn may depend not only on Ty 
and 7,, but also on the working gas’ properties. 


An exception from this rule is the famous Carnot cycle, consisting of two isothermal and two 
adiabatic processes (all reversible!). In its heat engine’s form, the cycle may start, for example, from an 
isothermic expansion of the working gas in contact with the hot bath (i.e. at T= Ty). It is followed by its 
additional adiabatic expansion (with the gas being disconnected from both heat baths) until its 
temperature drops to 7,. Then an isothermal compression of the gas is performed in its contact with the 
cold bath (at 7 = 7), followed by its additional adiabatic compression to raise T to Ty again, after which 
the cycle is repeated again and again. Note that during this cycle the working gas is never in contact 
with both heat baths simultaneously, thus avoiding the irreversible heat transfer between them. The 
cycle’s shape on the [V, P] plane (Fig. 9a) depends on the exact properties of the working gas and may 
be rather complicated. However, since the system’s entropy is constant at any adiabatic process, the 
Carnot cycle’s shape on the [S, 7] plane is always rectangular — see Fig. 9b. 


Fig. 1.9. Representation of the 
Carnot cycle: (a) on the [V, P] 
plane (schematically), and (b) on 
the [S, 7] plane. The meaning of 
the arrows is the same as in Fig. 8. 


Since during each isotherm, the working gas is brought into thermal contact only with the 
corresponding heat bath, i.e. its temperature is constant, the relation (19), dO = TdS, may be 
immediately integrated to yield 


O, =T(S,-S,) OQ, = T,(S, — S}). (1.65) 
Hence the ratio of these two heat flows is completely determined by their temperature ratio: 
T, 
oT (1.66) 
Oo. T, 
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regardless of the working gas properties. Formulas (63) and (66) are sufficient to find the ratio of the 
work (—7/) to any of Oy and Q,. For example, the main figure-of-merit of a thermal machine used as a 


W | > 0), is its efficiency 


heat engine (Oy > 0, O, > 0, -W= 


-F Z Heat 
pa Pl On 8.) o (1.67) eres, 
definition 


Carnot 


el .68) cycle’s 


efficiency 


which shows that at a given 7, (that is typically the ambient temperature ~300 K), the efficiency may be 
increased, ultimately to 1, by raising the temperature Ty of the heat source.* 


The unique nature of the Carnot cycle (see Fig. 9b again) makes its efficiency (68) the upper 
limit for any heat engine.*3 Indeed, in this cycle, the transfer of heat between any heat bath and the 
working gas is performed reversibly, when their temperatures are equal. (If this is not so, some heat 
might flow from a hotter to colder bath without performing any work.) In particular, it shows that max = 
0 at Ty = 7, 1.e., no heat engine can perform mechanical work in the absence of temperature gradients.*4 


On the other hand, if the cycle is reversed (see the dashed arrows in Figs. 8 and 9), the same 
thermal machine may serve as a refrigerator, providing heat removal from the low-temperature bath (OQ, 
<0) at the cost of consuming external mechanical work: WW > 0. This reversal does not affect the basic 
relation (63), which now may be used to calculate the relevant figure-of-merit, called the cooling 
coefficient of performance (COP cooling): 


2; (1.69) 


cooling ~~ W On -~O, : 


Notice that this coefficient may be above unity; in particular, for the Carnot cycle we may use Eq. (66) 
(which is also unaffected by the cycle reversal) to get 


COP. 


(COP. - (1.70) 


eels) aa - 7 7 > 


41 Curiously, S. Carnot derived his key result still believing that heat is some specific fluid (“caloric”), whose flow 
is driven by the temperature difference, rather than just a form of particle motion. 

42 Semi-quantitatively, such trend is valid also for other, less efficient but more practicable heat engine cycles — 
see Problems 13-16. This trend is the leading reason why internal combustion engines, with Ty of the order of 
1,500 K, are more efficient than steam engines, with the difference Ty — T, of at most a few hundred K. 

43 In some alternative axiomatic systems of thermodynamics, this fact is postulated and serves the role of the 2™ 
law. This is why it is under persisting (dominantly, theoretical) attacks by suggestions of more efficient heat 
engines — recently, mostly of quantum systems using sophisticated protocols such as the so-called shortcut-to- 
adiabaticity — see, e.g., the recent paper by O. Abah and E. Lutz, Europhysics Lett. 118, 40005 (2017), and 
references therein. To the best of my knowledge, reliable analyses of all the suggestions put forward so far have 
confirmed that the Carnot efficiency (68) is the highest possible even in quantum systems. 

44 Such a hypothetical heat engine, which would violate the 2"! law of thermodynamics, is called the “perpetual 
motion machine of the 2™ kind” — in contrast to any (also hypothetical) “perpetual motion machine of the 1“ 
kind” that would violate the 1“ law, i.e., the energy conservation. 
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so that this value is larger than 1 at Ty < 27,, and even may be much larger than that when the 
temperature difference (7 — 7) sustained by the refrigerator, tends to zero. For example, in a typical 
air-conditioning system, this difference is of the order of 10 K, while 7, ~ 300 K, so that (Jy — T,) ~ 
T,/30, 1.e. the Carnot value of COP eooling iS as high as ~30. (In the state-of-the-art commercial HVAC 
systems it is within the range of 3 to 4.) This is why the term “cooling efficiency”, used in some 
textbooks instead of (COP) cooling, may be misleading. 


Since in the reversed cycle Oy = —-W + QO, <0, i.e. the system provides heat flow into the high- 
temperature heat bath, it may be used as a heat pump for heating purposes. The figure-of-merit 
appropriate for this application is different from Eq. (69): 


Qu) Qn 


COP yeating = W 0.-0," (1.71) 
so that for the Carnot cycle, using Eq. (66) again, we get 
(COP sing canes = E- (1.72) 
L, Hq L, Ts 


Note that this COP is always larger than 1, meaning that the Carnot heat pump is always more 
efficient than the direct conversion of work into heat (when Qy = —7%, so that COPheating = 1), though 
practical electricity-driven heat pumps are substantially more complex, and hence more expensive than 
simple electric heaters. Such heat pumps, with the typical COPheating values around 4 in summer and 2 in 
winter, are frequently used for heating large buildings. 


Finally, note that according to Eq. (70), the COP rooting of the Carnot cycle tends to zero at T, > 
0, making it impossible to reach the absolute zero of temperature, and hence illustrating the meaningful 
(Nernst’s) formulation of the 3“ law of thermodynamics, cited in Sec. 3. Indeed, let us prescribe a finite 
but very large heat capacity C(T) to the low-temperature bath, and use the definition of this variable to 
write the following expression for the relatively small change of its temperature as a result of dn similar 
refrigeration cycles: 
C(Z, )dT, =Q,dn. (1.73) 


Together with Eq. (66), this relation yields 


CHa, __|Qul 
T, Ty 


(1.74) 


If 7, > 0, so that Ty >>7, and | On| » —7/= const, the right-hand side of this equation does not depend 
on 7,, so that if we integrate it over many (n >> 1) cycles, getting the following simple relation between 
the initial and final values of 7,: 


“ C(D)aT __[Qul 


———n. (1.75) 
Tini is Ty 
For example, if C(7) is a constant, Eq. (75) yields an exponential law, 
Tin = Ty o9f- 2 of (1.76) 
CT 
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with the absolute zero of temperature not reached as any finite n. Even for an arbitrary function C(T) 
that does not vanish at 7’ — 0, Eq. (74) proves the Nernst theorem, because dn diverges at T, > 0.45 


1.7. Exercise problems 


1.1. Two bodies, with temperature-independent heat capacities C; and C2, and different initial 
temperatures 7; and 7, are placed into a weak thermal contact. Calculate the change of the total entropy 
of the system before it reaches the thermal equilibrium. 


1.2. A gas portion has the following properties: 


(i) its heat capacity Cy = aT”, and 
(11) the work 7/7 needed for its isothermal compression from V> to V; equals c7In(V2/V), 


where a, b, and c are some constants. Find the equation of state of the gas, and calculate the temperature 
dependence of its entropy S and thermodynamic potentials EF, H, F, G, and Q. 


1.3. A closed volume with an ideal classical gas of similar molecules is separated with a partition 
in such a way that the number N of molecules in each part is the same, but their volumes are different. 
The gas is initially in thermal equilibrium, and its pressure in one part is P;, and in the other part, Po. 
Calculate the change of entropy resulting from a fast removal of the partition, and analyze the result. 


1.4. An ideal classical gas of N particles is initially confined to volume V, and is in thermal 
equilibrium with a heat bath of temperature 7. Then the gas is allowed to expand to volume V’ > V in 
one of the following ways: 


(i) The expansion is slow, so that due to the sustained thermal contact with the heat bath, the gas 
temperature remains equal to T. 

(ii) The partition separating the volumes V and (V’ —V) is removed very fast, allowing the gas to 
expand rapidly. 


For each process, calculate the eventual changes of pressure, temperature, energy, and entropy of 
the gas at its expansion. 


1.5. For an ideal classical gas with temperature-independent specific heat, derive the relation 
between P and V at an adiabatic expansion/compression. 


1.6. Calculate the speed and the wave impedance of acoustic waves propagating in an ideal 
classical gas with temperature-independent specific heat, in the limits when the propagation may be 
treated as: 


(1) an isothermal process, and 
(11) an adiabatic process. 


45 Note that for such metastable systems as glasses the situation may be more complicated. (For a detailed 
discussion of this issue see, e.g., J. Wilks, The Third Law of Thermodynamics, Oxford U. Press, 1961.) 
Fortunately, this issue does not affect other aspects of statistical physics — at least those to be discussed in this 
course. 
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Which of these limits is achieved at higher wave frequencies? 


1.7. As will be discussed in Sec. 3.5, the so-called “hardball” models of classical particle 
interaction yield the following equation of state of a gas of such particles: 


P=To(n), 


where n = N/V is the particle density, and the function g(n) is generally different from that (@idea(7) = 7) 
of the ideal gas, but still independent of temperature. For such a gas, with temperature-independent cy, 
calculate: 


(1) the energy of the gas, and 
(ii) its pressure as a function of n at the adiabatic compression. 


1.8. For an arbitrary thermodynamic system with a fixed number of particles, prove the 
following four Maxwell relations (already mentioned in Sec. 4): 


(2) ©), 
ci: (5) ae) (3), Gh), 


and also the following relation: 
al T OT Vv 


1.9. Express the heat capacity difference, Cp — Cy, via the equation of state P = P(V, 7) of the 
system. 


1.10. Prove that the isothermal compressibility 


|=) 
Kp =->| = 
POP ge 


in a single-phase system may be expressed in two different ways: 
V*(&P V (ON 
Ip = 2 6 2 7 2 rs) ‘ 
N\CH Jp NOCH py 


1.11. A reversible process, performed with a fixed portion of an ideal 
classical gas, may be represented on the [V, P] plane with the straight line 
shown in the figure on the right. Find the point at which the heat flow into/out 
of the gas changes its direction. 


1.12. Two bodies have equal, temperature-independent heat capacities 


46 Note that the compressibility is just the reciprocal bulk modulus, « = 1/K — see, e.g., CM Sec. 7.3. 
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C, but different temperatures, 7, and 7>. Calculate the maximum mechanical work obtainable from this 
system, using a heat engine. 


1.13. Express the efficiency 77 of a heat engine that uses the so- 
called Joule cycle, consisting of two adiabatic and two isobaric 
processes (see the figure on the right), via the minimum and maximum 
values of pressure, and compare the result with 77camot. Assume an ideal 
classical working gas with temperature-independent Cp and Cy. 


1.14. Calculate the efficiency of a heat engine using the Otto 
cycle,47 which consists of two adiabatic and two isochoric (constant- 
volume) reversible processes — see the figure on the right. Explore how 
the efficiency depends on the ratio r = Vnax/Vmin, and compare it with the 
Carnot cycle’s efficiency. Assume an ideal classical working gas with 
temperature-independent heat capacity. 


1.15. A heat engine’s cycle consists of two isothermal (T = 
const) and two isochoric (V = const) reversible processes — see the figure 
on the right.48 


(i) Assuming that the working gas is an ideal classical gas of NV 
particles, calculate the mechanical work performed by the engine during 
one cycle. 

(ii) Are the specified conditions sufficient to calculate the 
engine’s efficiency? (Justify your answer.) 


1.16. The Diesel cycle (an approximate model of the Diesel 
internal combustion engine’s operation) consists of two adiabatic 
processes, one isochoric process, and one isobaric process — see the 
figure on the right. Assuming an ideal working gas with temperature- 
independent Cy and Cp, express the efficiency 7 of the heat engine using 
this cycle via the gas temperature values in its transitional states 


corresponding to the corners of the cycle diagram. 


47 This name stems from the fact that the cycle is an approximate model of operation of the four-stroke internal 
combustion engine, which was improved and made practicable (though not invented!) by N. Otto in 1876. 

48 The reversed cycle of this type is a reasonable approximation for the operation of the Stirling and Gifford- 
McMahon (GM) refrigerators, broadly used for cryocooling — for a recent review see, e.g., A. de Waele, J. Low 
Temp. Phys. 164, 179 (2011). 
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Chapter 2. Principles of Physical Statistics 


This chapter is the keystone of this course. It starts with a brief discussion of such basic notions of 
statistical physics as_ statistical ensembles, probability, and ergodicity. Then the so-called 
microcanonical distribution postulate is formulated, simultaneously with the statistical definition of the 
entropy. This allows a derivation of the famous Gibbs (“canonical”) distribution — the most frequently 
used tool of statistical physics. Then we will discuss one more, “grand canonical” distribution, which is 
more convenient for some tasks. In particular, it is immediately used for the derivation of the most 
important Boltzmann, Fermi-Dirac, and Bose-Einstein statistics of independent particles, which will be 
repeatedly utilized in the following chapters. 


2.1. Statistical ensembles and probability 


As has been already discussed in Sec. 1.1, statistical physics deals with situations when either 
unknown initial conditions, or system’s complexity, or the laws of its motion (as in the case of quantum 
mechanics) do not allow a definite prediction of measurement results. The main formalism for the 
analysis of such systems is the probability theory, so let me start with a very brief review of its basic 
concepts, using an informal “physical” language — less rigorous but (hopefully) more transparent than 
standard mathematical treatments,! and quite sufficient for our purposes. 


Consider N >> 1 independent similar experiments carried out with apparently similar systems 
(i.e. systems with identical macroscopic parameters such as volume, pressure, etc.), but still giving, by 
any of the reasons listed above, different results of measurements. Such a collection of experiments, 
together with a fixed method of result processing, is a good example of a statistical ensemble. Let us 
start from the case when the experiments may have M different discrete outcomes, and the number of 
experiments giving the corresponding different results is Ni, N,..., Nv, so that 


vN, =N. (2.1) 


m=1 


The probability of each outcome, for the given statistical ensemble, is then defined as 
= Ni 
W., = ht sae We (2.2) 


Though this definition is so close to our everyday experience that it is almost self-evident, a few remarks 
may still be relevant. 


First, the probabilities W,,, depend on the exact statistical ensemble they are defined for, notably 
including the method of result processing. As the simplest example, consider throwing the standard 
cubic-shaped dice many times. For the ensemble of all thrown and counted dice, the probability of each 
outcome (say, “1”) is 1/6. However, nothing prevents us from defining another statistical ensemble of 
dice-throwing experiments in which all outcomes “1” are discounted. Evidently, the probability of 


! For the reader interested in a more rigorous approach, I can recommend, for example, Chapter 18 of the 
handbook by G. Korn and T. Korn — see MA Sec. 16(ii). 
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finding outcomes “1” in this modified (but legitimate) ensemble is 0, while for all other five outcomes 
(“*2” to “6”), it is 1/5 rather than 1/6. 


Second, a statistical ensemble does not necessarily require N similar physical systems, e.g., NV 
distinct dice. It is intuitively clear that tossing the same die N times constitutes an ensemble with similar 
statistical properties. More generally, a set of N experiments with the same system gives a statistical 
ensemble equivalent to the set of experiments with N different systems, provided that the experiments 
are kept independent, i.e. that outcomes of past experiments do not affect those of the experiments to 
follow. Moreover, for many physical systems of interest, no special preparation of each experiment is 
necessary, and N experiments separated by sufficiently long time intervals, form a “good” statistical 
ensemble — the property called ergodicity.” 


Third, the reference to infinite N in Eq. (2) does not strip the notion of probability from its 
practical relevance. Indeed, it is easy to prove (see Chapter 5) that, at very general conditions, at finite 
but sufficiently large N, the numbers JN, are approaching their average (or expectation) values? 


(N,)=W,N (2.3) 


with the relative deviations decreasing as ~LM Nm)", i.e. as 1/N". 


Now let me list those properties of probabilities that we will immediately need. First, dividing 
both sides of Eq. (1) by N and following the limit NV — o, we get the well-known normalization 
condition 


M 
>, =1; (2.4) 
m=1 


just remember that it is true only if each experiment definitely yields one of the outcomes N;, Np,..., Ni. 


Second, if we have an additive function of the results, 
] M 
f=sDN hw (2.5) 
N m=1 


where fn are some definite (deterministic) coefficients, the statistical average (also called the 
expectation value) of the function is naturally defined as 


? The most popular counter-example is an energy-conserving system. Consider, for example, a system of particles 
placed in a potential that is a quadratic form of its coordinates. The theory of oscillations tells us (see, e.g., CM 
Sec. 6.2) that this system is equivalent to a set of non-interacting harmonic oscillators. Each of these oscillators 
conserves its own initial energy £; forever, so that the statistics of N measurements of one such system may differ 
from that of N different systems with a random distribution of £;, even if the total energy of the system, E = X)F;, 
is the same. Such non-ergodicity, however, is a rather feeble phenomenon and is readily destroyed by any of 
many mechanisms, such as weak interaction with the environment (leading, in particular, to oscillation damping), 
potential anharmonicity (see, e.g., CM Chapter 5), and chaos (CM Chapter 9), all of them strongly enhanced by 
increasing the number of particles in the system, i.e. the number of its degrees of freedom. This is why an 
overwhelming part of real-life systems are ergodic; for the readers interested in non-ergodic exotics, I can 
recommend the monograph by V. Arnold and A. Avez, Ergodic Problems of Classical Mechanics, Addison- 
Wesley, 1989. 

3 Here, and everywhere in this series, angle brackets (...) mean averaging over a statistical ensemble, which is 
generally different from averaging over time — as it will be the case in quite a few examples considered below. 
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; Le 
(f) = bimy 5.7 (Nn) fo (2.6) 
so that using Eq. (3) we get 
(f)= Wnt: (2.7) 


Notice that Eq. (3) may be considered as the particular form of this general result, when all f,, = 1. 


Next, the spectrum of possible experimental outcomes is frequently continuous for all practical 
purposes. (Think, for example, about the set of positions of the marks left by bullets fired into a target 
from afar.) The above formulas may be readily generalized to this case; let us start from the simplest 
situation when all different outcomes may be described by just one continuous scalar variable g — which 
replaces the discrete index m in Eqs. (1)-(7). The basic relation for this case is the self-evident fact that 
the probability dW of having an outcome within a small interval dq near some point g is proportional to 
the magnitude of that interval: 

dW =w(q)dq, (2.8) 


where w(q) is some function of g, which does not depend on dq. This function is called probability 
density. Now all the above formulas may be recast by replacing the probabilities W,,, with the products 
(8), and the summation over m, with the integration over gq. In particular, instead of Eq. (4) the 
normalization condition now becomes 


[w@dq =1, (2.9) 


where the integration should be extended over the whole range of possible values of g. Similarly, instead 
of the discrete values /,, participating in Eq. (5), it is natural to consider a function f(q). Then instead of 
Eq. (7), the expectation value of the function may be calculated as 


(f) = J wa f(qd4. (2.10) 


It is also straightforward to generalize these formulas to the case of more variables. For example, 
the state of a classical particle with three degrees of freedom may be fully described by the probability 
density w defined in the 6D space of its generalized radius-vector q and momentum p. As a result, the 
expectation value of a function of these variables may be expressed as a 6D integral 


(f) =] wap) f(a.p) a qd’ p. (2.11) 


Some systems considered in this course consist of components whose quantum properties 
cannot be ignored, so let us discuss how (f/f) should be calculated in this case. If by f,, we mean 
measurement results, then Eq. (7) (and its generalizations) remains valid, but since these numbers 
themselves may be affected by the intrinsic quantum-mechanical uncertainty, it may make sense to have 
a bit deeper look into this situation. Quantum mechanics tells us* that the most general expression for 
the expectation value of an observable f in a certain ensemble of macroscopically similar systems is 


(f) = DY wi Ft = Tr(Wf) . (2.12) 


m,m' 


4 See, e.g., QM Sec. 7.1. 
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Here fnm are the matrix elements of the quantum-mechanical operator f corresponding to the 
observable f, in a full basis of orthonormal states m, 


Fm = (ma f|m'), (2.13) 


while the coefficients W,,,,’ are the elements of the so-called density matrix W, which represents, in the 


same basis, the density operator W describing properties of this ensemble. Eq. (12) is evidently more 
general than Eq. (7), and is reduced to it only if the density matrix is diagonal: 


Wim =W,,0 (2.14) 


m' ~~ "" mS mm' 


(where Onm’ is the Kronecker symbol), when the diagonal elements W,, play the role of probabilities of 
the corresponding states. 


Thus formally, the largest difference between the quantum and classical description is the 
presence, in Eq. (12), of the off-diagonal elements of the density matrix. They have the largest values in 
the pure (also called “coherent”) ensemble, in which the state of the system may be described with state 
vectors, ¢.g., the ket-vector 

a) = a 


where Q@, are some (generally, complex) coefficients. In this case, the density matrix elements are 
merely 


m), (2.15) 


W, 


mm' 


=A ay, (2.16) 


so that the off-diagonal elements are of the same order as the diagonal elements. For example, in the 
very important particular case of a two-level system, the pure-state density matrix is 


W = ie a (2.17) 
a; Qa a a, 
so that the product of its off-diagonal components is as large as that of the diagonal components. 


In the most important basis of stationary states, ie. the eigenstates of the system’s time- 
independent Hamiltonian, the coefficients a, oscillate in time as* 


E E 
a, (t)= a, (0) exp Fas} = leno) - a +i¢,, \ (2.18) 
where E,,, are the corresponding eigenenergies, and @,, are constant phase shifts. This means that while 
the diagonal terms of the density matrix (16) remain constant, its off-diagonal components are 


oscillating functions of time: 


Wi =@ 0, = jrucylenpyi =e exp{i(g,, —9,,)}. (2.19) 


5 Here I use the Schrédinger picture of quantum dynamics, in which the matrix elements /f,,, representing 
quantum-mechanical operators, do not evolve in time. The final results of this discussion do not depend on the 
particular picture — see, e.g., QM Sec. 4.6. 
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Due to the extreme smallness of the Planck constant (on the human scale of things), minuscule random 
perturbations of eigenenergies are equivalent to substantial random changes of the phase multipliers, so 
that the time average of any off-diagonal matrix element tends to zero. Moreover, even if our statistical 
ensemble consists of systems with exactly the same £,,,, but different values @,, (which are typically hard 
to control at the initial preparation of the system), the average values of all Win» (with m # m’) vanish 
again. 


This is why, besides some very special cases, typical statistical ensembles of quantum particles 
are far from being pure, and in most cases (certainly including the thermodynamic equilibrium), a good 
approximation for their description is given by the opposite limit of the so-called classical mixture, in 
which all off-diagonal matrix elements of the density matrix equal zero, and its diagonal elements Wn, 
are merely the probabilities W,, of the corresponding eigenstates. In this case, for the observables 
compatible with energy, Eq. (12) is reduced to Eq. (7), with f,, being the eigenvalues of the variable f, so 
that we may base our further discussion on this key relation and its continuous extensions (10)-(11). 


2.2. Microcanonical ensemble and distribution 


Now we move to the now-standard approach to statistical mechanics, based on the three 
statistical ensembles introduced in the 1870s by Josiah Willard Gibbs.© The most basic of them is the 
so-called microcanonical statistical ensemble’ defined as a set of macroscopically similar closed 
(isolated) systems with virtually the same total energy E. Since in quantum mechanics the energy of a 
closed system is quantized, in order to make the forthcoming discussion suitable for quantum systems as 
well, it is convenient to include to the ensemble all systems with energies £,, within a relatively narrow 
interval AE << E (see Fig. 1) that is nevertheless much larger than the average distance dE between the 
energy levels, so that the number M of different quantum states within the interval AF is large, M >> 1. 
Such choice of AE is only possible if 6E << E; however, the reader should not worry too much about 
this condition, because the most important applications of the microcanonical ensemble are for very 
large systems (and/or very high energies) when the energy spectrum is very dense.’ 


E 
Fig. 2.1. A very schematic image of the microcanonical 
} AE ensemble. (Actually, the ensemble deals with quantum 
states rather than energy /evels. An energy level may be 
degenerate, i.e. correspond to several states.) 


This ensemble serves as the basis for the formulation of the postulate which is most frequently 
called the microcanonical distribution (or, more adequately, “the main statistical postulate” or “the main 


6 Personally, I believe that the genius of J. Gibbs, praised by Albert Einstein as the “greatest mind in the American 
history”, is still insufficiently recognized, and agree with R. Millikan that Gibbs “did for statistical mechanics and 
thermodynamics what [...] Maxwell did for electrodynamics”. 

7 The terms “‘microcanonical”, as well as “canonical” (see Sec. 4 below) are apparently due to Gibbs and I was 
unable to find out his motivation for the former name. (““Canonical” in the sense of “standard” or “common” is 
quite appropriate, but why “micro”? Perhaps to reflect the smallness of AE?) 

8 Formally, the main result of this section, Eq. (20), is valid for any M (including M = 1); it is just less informative 
for small M — and trivial for M@ = 1. 
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statistical hypothesis”): in the thermodynamic equilibrium of a microcanonical ensemble, all its states 
have equal probabilities, 


W,= a = const. (2.20) 
M 

Though in some constructs of statistical mechanics this equality is derived from other axioms, which 

look more plausible to their authors, I believe that Eq. (20) may be taken as the starting point of the 

statistical physics, supported “just” by the compliance of all its corollaries with experimental 

observations. 


Note that the postulate (20) is closely related to the macroscopic irreversibility of the systems 
that are microscopically virtually reversible (closed): if such a system was initially in a certain state, its 
time evolution with just minuscule interactions with the environment (which is necessary for reaching 
the thermodynamic equilibrium) eventually leads to the uniform distribution of its probability among all 
states with essentially the same energy. Each of these states is not “better” than the initial one; rather, in 
a macroscopic system, there are just so many of these states that the chance to find the system in the 
initial state is practically nil — again, think about the ink drop diffusion into a glass of water.’ 


Now let us find a suitable definition of the entropy S of a microcanonical ensemble’s member — 
for now, in the thermodynamic equilibrium only. This was done in 1877 by another giant of statistical 
physics, Ludwig Eduard Boltzmann — on the basis of the prior work by James Clerk Maxwell on the 
kinetic theory of gases — see Sec. 3.1 below. In the present-day terminology, since S is a measure of 
disorder, it should be related to the amount of information!® Jost when the system went irreversibly from 
the full order to the full disorder, i.e. from one definite state to the microcanonical distribution (20). In 
an even more convenient formulation, this is the amount of information necessary to find the exact state 
of your system in a microcanonical ensemble. 


In the information theory, the amount of information necessary to make a definite choice 
between two options with equal probabilities (Fig. 2a) is defined as 


1(2) = log, 2 =1. (2.21) 


This unit of information is called a bit. 


(a) —___—s— (b) 


I | — Fig. 2.2. “Logarithmic trees” of binary decisions 
—_ for choosing between (a) M = 2, and (b) M= 4 
opportunities with equal probabilities. 


1 bit 1 bit 
1 bit 


9 Though I have to move on, let me note that the microcanonical distribution (20) is a very nontrivial postulate, 
and my advice to the reader is to find some time to give additional thought to this keystone of the whole building 
of statistical mechanics. 

10 | will rely on the reader’s common sense and intuitive understanding of what information is, because even in 
the formal information theory, this notion is essentially postulated — see, e.g., the wonderfully clear text by J. 
Pierce, An Introduction to Information Theory, Dover, 1980. 
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Now, if we need to make a choice between four equally probable opportunities, it can be made in 
two similar steps (Fig. 2b), each requiring one bit of information, so that the total amount of information 
necessary for the choice is 

I(4) = 21(2) =2 = log, 4. (2.22) 


An obvious extension of this process to the choice between M = 2” states gives 
I(M) =mI(2) =m =log, M. (2.23) 


This measure, if extended naturally to any integer M is quite suitable for the definition of entropy 
at equilibrium, with the only difference that, following tradition, the binary logarithm is replaced with 
the natural one:!! 

S=InM. (2.24a) 


Using Eq. (20), we may recast this definition in its most frequently used form 


S= fie =-InVW,,. (2.24b) 
W 


(Again, please note that Eq. (24) is valid in thermodynamic equilibrium only!) 


Note that Eq. (24) satisfies the major properties of the entropy discussed in thermodynamics. 
First, it is a unique characteristic of the disorder. Indeed, according to Eq. (20), M (at fixed AEF) is the 
only possible measure characterizing the microcanonical distribution, and so is its unique function InM. 
This function also satisfies another thermodynamic requirement to the entropy, of being an extensive 
variable. Indeed, for several independent systems, the joint probability of a certain state is just a product 
of the partial probabilities, and hence, according to Eq. (24), their entropies just add up. 


Now let us see whether Eqs. (20) and (24) are compatible with the 2" law of thermodynamics. 
For that, we need to generalize Eq. (24) for S to an arbitrary state of the system (generally, out of 
thermodynamic equilibrium), with an arbitrary set of state probabilities W,,,. Let us first recognize that M 
in Eq. (24) is just the number of possible ways to commit a particular system to a certain state m (m = 1, 
2,...M), in a statistical ensemble where each state is equally probable. Now let us consider a more 
general ensemble, still consisting of a large number N >> 1 of similar systems, but with a certain number 
Nn = W,,N >> 1 of systems in each of M states, with the factors W,,, not necessarily equal. In this case, 
the evident generalization of Eq. (24) is that the entropy Sy, of the whole ensemble is 


S, =InM(N,,N,...), (2.25) 


where M (MN, N2,...) is the number of ways to commit a particular system to a certain state m while 
keeping all numbers JN, fixed. This number M (N,, No,...) is clearly equal to the number of ways to 
distribute N distinct balls between M different boxes, with the fixed number JN,, of balls in each box, but 


!1 This is of course just the change of a constant factor: S(M) = InM = In2 x logoM = In2 x I(M) ~ 0.693 (M). A 
review of Chapter 1 shows that nothing in thermodynamics prevents us from choosing such a constant coefficient 
arbitrarily, with the corresponding change of the temperature scale — see Eq. (1.9). In particular, in the SI units, 
where Eq. (24b) becomes S' = —kglnW,,, one bit of information corresponds to the entropy change AS = kp In2 ~ 
0.693 kg ~ 0.965x107° J/K. By the way, the formula “S = k logW” is engraved on L. Boltzmann’s tombstone in 
Vienna. 
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in no particular order within it. Comparing this description with the definition of the so-called 
multinomial coefficients,'* we get 


! M 
M(N,,N,,...) = “C = withN = °N,, . (2.26) 


Ns No a0 Ny NING! yl = 
To simplify the resulting expression for Sy, we can use the famous Stirling formula, in its 
crudest, de Moivre’s form,!? whose accuracy is suitable for most purposes of statistical physics: 


In(N!) 599 > N(In N 1). (237) 


When applied to our current problem, this formula gives the following average entropy per system, !4 


M 


=a nn yn, ! | w > | Naw -1)->°N,, (InN, - | 
N ae (2.28) 


m=l m=1 
2S Nain Ne 


m=1 


and since this result is only valid in the limit V,, + oo anyway, we may use Eq. (2) to represent it as 


M Entropy 
=-)'W,, nW,, ->W, In. (2.29) — outof 


equilibrium 


m=1 m=l m 


This extremely important result!5 may be interpreted as the average of the entropy values given by Eq. 
(24), weighed with specific probabilities W,,, per the general formula (7).!° 


Now let us find what distribution of probabilities W,, provides the largest value of the entropy 
(29). The answer is almost evident from a good glance at Eq. (29). For example, if for a subgroup of M’ 
<M states the coefficients W,,, are constant and equal to 1/M’, so that W,, = 0 for all other states, all 17’ 
non-zero terms in the sum (29) are equal to each other, so that 


Ce en ng (2.30) 
M' 


and the closer M’ to its maximum value M the larger S. Hence, the maximum of S is reached at the 
uniform distribution given by Eq. (24). 


!2 See, e.g., MA Eq. (2.3). Despite the intimidating name, Eq. (26) may be very simply derived. Indeed, M! is just 
the number of all possible permutations of N balls, i.e. the ways to place them in certain positions — say, inside M 
boxes. Now to take into account that the particular order of the balls in each box is not important, that number 
should be divided by all numbers N,,,! of possible permutations of balls within each box — that’s it. 

13 See, e.g., MA Eq. (2.10). 

14 Strictly speaking, I should use the notation (S) here. However, following the style accepted in thermodynamics, 
I will drop the averaging signs until we will really need them to avoid confusion. Again, this shorthand is not too 
bad because the relative fluctuations of entropy (as those of any macroscopic variable) are very small at VN >> 1. 

'5 With the replacement of InW,, with log,W,, (i.e. division of both sides by In2), Eq. (29) becomes the famous 
Shannon (or “Boltzmann-Shannon’”) formula for the average information J per symbol in a long communication 
string using M different symbols, with probability W,,, each. 

16 In some textbooks, this interpretation is even accepted as the derivation of Eq. (29); however, it is evidently 
less strict than the one outlined above. 
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In order to prove this important fact more strictly, let us find the maximum of the function given 
by Eq. (29). If its arguments W1, Wo, ...Wiz were completely independent, this could be done by finding 
the point (in the /-dimensional space of the coefficients W,,,) where all partial derivatives 0S/OW,,, equal 
zero. However, since the probabilities are constrained by the condition (4), the differentiation has to be 
carried out more carefully, taking into account this interdependence: 


6 aS 0S OW, 
—— S(W,,W,,... + 23i 
ES oe | - ow, Low, OW, a) 


m m' 


At the maximum of the function S, all such expressions should be equal to zero simultaneously. This 
condition yields 0S/OW,, = A, where the so-called Lagrange multiplier A is independent of m. Indeed, at 
such point Eq. (31) becomes 


0 0 
—— S(W, ,W,,,... =A+t+ as = + =A 1=0. 2.32 
2 (WW, | “ee aa pa me aw.” (2.32) 
For our particular expression (29), the condition 0S/0OW,,, = A yields 
ae [-W,, nw, |=-InW,, -1=2. (2.33) 
OWw,, dW, 


The last equality holds for all m (and hence the entropy reaches its maximum value) only if W,, is 
independent on m. Thus the entropy (29) indeed reaches its maximum value (24) at equilibrium. 


To summarize, we see that the statistical definition (24) of entropy does fit all the requirements 
imposed on this variable by thermodynamics. In particular, we have been able to prove the 2™ law of 
thermodynamics using that definition together with the fundamental postulate (20). 


Now let me discuss one possible point of discomfort with that definition: the values of M/, and 
hence W,,, depend on the accepted energy interval AF of the microcanonical ensemble, for whose choice 
no exact guidance is offered. However, if the interval AF contains many states, M >> 1, as was assumed 
before, then with a very small relative error (vanishing in the limit M — oo), M may be represented as 


M = g(E)AE, (2.34) 
where 2(£) is the density of states of the system: 


aX(E) 


g(h)= TE 


(2.35) 
X(E) being the total number of states with energies below E. (Note that the average interval 6E between 
energy levels, mentioned at the beginning of this section, is just AE/M = 1/g(E).) Plugging Eq. (34) into 
Eq. (24), we get 

S=InM =Ing(£)+In AE, (2.36) 


so that the only effect of a particular choice of AE is an offset of the entropy by a constant, and in 
Chapter 1 we have seen that such constant shift does not affect any measurable quantity. Of course, Eq. 
(34), and hence Eq. (36) are only precise in the limit when the density of states g(£) is so large that the 
range available for the appropriate choice of AE: 
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g'(E)<<AE << E, (2.37) 


is sufficiently broad: g(E)E = E/6E >> 1. 


In order to get some feeling of the functions g(£) and S(£) and the feasibility of the condition 
(37), and also to see whether the microcanonical distribution may be directly used for calculations of 
thermodynamic variables in particular systems, let us apply it to a microcanonical ensemble of many 
sets of N >> 1 independent, similar harmonic oscillators with frequency w. (Please note that the 
requirement of a virtually fixed energy is applied, in this case, to the total energy Ey of each set of 
oscillators, rather to energy E of a single oscillator — which may be virtually arbitrary, though certainly 
much less than Ey ~ NE >> E.) Basic quantum mechanics tells us!” that the eigenenergies of such an 
oscillator form a discrete, equidistant spectrum: 


E., =hof m3} where m = 0,1, 2,... (2.38) 


If @ is kept constant, the ground-state energy fiw/2 does not contribute to any thermodynamic properties 
of the system,!® so that for the sake of simplicity we may take that point as the energy origin, and 
replace Eq. (38) with EZ, = mhw. Let us carry out an approximate analysis of the system for the case 
when its average energy per oscillator, 


E=-—, 2.39 
a (2.39) 


is much larger than the energy quantum fio. 


For one oscillator, the number of states with energy & below a certain value E; >> ho is 
evidently X(£)) ~ E\/h@ = (E\/hw)/1! (Fig. 3a). For two oscillators, all possible values of the total 
energy (€; + &€2) below some level £2 correspond to the points of a 2D square grid within the right 
triangle shown in Fig. 3b, giving X(E») ~ (1/2)(E2/hw)’ = (Ex/hw)"/2!. For three oscillators, the possible 
values of the total energy (€) + €2 + €3) correspond to those points of the 3D cubic grid, that fit inside the 
right pyramid shown in Fig. 3c, giving X(E3) ~ (1/3)[(1/2)(E3/ha)’] = (E3/ha)*/3!, etc. 


(a) (b) 


E| 


0 hw 2ho hoo 
Y(E1)x he 


Fig. 2.3. Calculating functions X(£y) for systems of (a) one, (b) two, and (c) three harmonic oscillators. 


'7 See, e.g., QM Secs. 2.9 and 5.4. 

18 Let me hope that the reader knows that the ground-state energy is experimentally measurable — for example, 
using the famous Casimir effect — see, e.g., QM Sec. 9.1. (In Sec. 5.5 below I will briefly discuss another method 
of experimental observation of that energy.) 
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An evident generalization of these formulas to arbitrary N gives the number of states!° 
N 
1(Fz 
LE) —" 1. 2.40 
(Ey) M o2 (2.40) 
Differentiating this expression over the energy, we get 

_dXEy) 1 EN" 

dEy  (N-D!(ha)*’ 


g(Ey) (2.41) 


so that 
S, (Ey) =Ing(E,)+ const =—In[(N —-1)!]+(N-1)InE, —NIn(ia)+const. (2.42) 


For N >> 1 we can ignore the difference between N and (N — 1) in both instances, and use the Stirling 
formula (27) to simplify this result as 


5, (E)~const = N{ In Ey +t} = {i é }=mI( =) } (2.43) 
Nh h 


oO a) ho 


(The second, approximate step is only valid at very high E/ha@ ratios, when the logarithm in Eq. (43) is 
substantially larger than 1.) Returning for a second to the density of states, we see that in the limit VN > 
oo, it is exponentially large: 


g(E,)=e" x =) (2.44) 
ho 


so that the conditions (37) may be indeed satisfied within a very broad range of AE. 


Now we can use Eq. (43) to find all thermodynamic properties of the system, though only in the 
limit E >> ha. Indeed, according to thermodynamics, if the system’s volume and the number of particles 
in it are fixed, the derivative dS/dE is nothing else than the reciprocal temperature in thermal 
equilibrium — see Eq. (1.9). In our current case, we imply that the harmonic oscillators are distinct, for 
example by their spatial positions. Hence, even if we can speak of some volume of the system, it is 
certainly fixed.2° Differentiating Eq. (43) over energy EF, we get 


(2.45) 


Reading this result backward, we see that the average energy F of a harmonic oscillator equals T (i.e. 
kgTx is SI units). At this point, the first-time student of thermodynamics should be very much relieved to 
see that the counter-intuitive thermodynamic definition (1.9) of temperature does indeed correspond to 
what we all have known about this notion from our kindergarten physics courses. 


The result (45) may be readily generalized. Indeed, in quantum mechanics, a harmonic oscillator 
with eigenfrequency @ may be described by the Hamiltonian operator 


19 The coefficient 1/NM! in this formula has the geometrical meaning of the (hyper)volume of the N-dimensional 
right pyramid with unit sides. 

20 For the same reason, the notion of pressure P in such a system is not clearly defined, and neither are any 
thermodynamic potentials but £ and F. 
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a (2.46) 


. 


where qg is some generalized coordinate, p is the corresponding generalized momentum, » is oscillator’s 
mass,?! and « is the spring constant, so that w = (x/m)'”. Since in the thermodynamic equilibrium the 


density matrix is always diagonal in the basis of stationary states m (see Sec. 1 above), the quantum- 
mechanical averages of the kinetic and potential energies may be found from Eq. (7): 


(2) = Sep Im (= “) >: W,(m||m), (2.47) 


m=0 


where W,, is the probability to occupy the m™ energy level, and bra- and ket-vectors describe the 
stationary state corresponding to that level.?2 However, both classical and quantum mechanics teach us 
that for any m, the bra-ket expressions under the sums in Eqs. (47), which represent the average kinetic 
and mechanical energies of the oscillator on its m™ energy level, are equal to each other, and hence each 
of them is equal to E,,,/2. Hence, even though we do not know the probability distribution W,,, yet (it will 


be calculated in Sec. 5 below), we may conclude that in the “classical limit” T >> he, 


(2.48) 


Now let us consider a system with an arbitrary number of degrees of freedom, described by a 
more general Hamiltonian:?3 
A2 a2 
A re ; K.d.: 
=S HA, with #,=-7/4+5%, (2.49) 
2m, 2 


with (generally, different) frequencies @ = (Kl). Since the “modes” (effective harmonic oscillators) 
contributing to this Hamiltonian, are independent, the result (48) is valid for each of the modes. This is 
the famous equipartition theorem: at thermal equilibrium with T >> ha, the average energy of each so- 
called half-degree of freedom (which is defined as any variable, either p; or gj, giving a quadratic 
contribution to the system’s Hamiltonian), is equal to 7/2.*4 In particular, for each of three Cartesian 
component contributions to the kinetic energy of a free-moving particle, this theorem is valid for any 
temperature, because such components may be considered as 1D harmonic oscillators with vanishing 


potential energy, 1.e. @ = 0, so that condition T >> hq is fulfilled at any temperature. 


7! T am using this fancy font for the mass to avoid any chance of its confusion with the state number. 

22 Note again that while we have committed the energy Ey of N oscillators to be fixed (to apply Eq. (36), valid 
only for a microcanonical ensemble at thermodynamic equilibrium), the single oscillator’s energy F in our 
analysis may be arbitrary — within the limits ha << E < Ey ~ NT. 

23 As a reminder, the Hamiltonian of any system whose classical Lagrangian function is an arbitrary quadratic 
form of its generalized coordinates and the corresponding generalized velocities, may be brought to the form (49) 
by an appropriate choice of “normal coordinates” g; which are certain linear combinations of the original 
coordinates — see, e.g., CM Sec. 6.2. 

24 This also means that in the classical limit, the heat capacity of a system is equal to one-half of the number of its 
half-degrees of freedom (in the SI units, multiplied by xg). 
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I believe that this case study of harmonic oscillator systems was a fair illustration of both the 
strengths and the weaknesses of the microcanonical ensemble approach.?> On one hand, we could 
readily calculate virtually everything we wanted in the classical limit T >> ha, but calculations for an 
arbitrary T ~ ha, though possible, are rather unpleasant because for that, all vertical steps of the function 
X(E wy) have to be carefully counted. In Sec. 4, we will see that other statistical ensembles are much more 
convenient for such calculations. 


Let me conclude this section with a short notice on deterministic classical systems with just a 
few degrees of freedom (and even simpler mathematical objects called “maps”) that may exhibit 
essentially disordered behavior, called the deterministic chaos.2° Such chaotic system may be 
approximately characterized by an entropy defined similarly to Eq. (29), where W,,, are the probabilities 
to find it in different small regions of phase space, at well-separated small time intervals. On the other 
hand, one can use an expression slightly more general than Eq. (29) to define the so-called Kolmogorov 
(or “Kolmogorov-Sinai”’) entropy K that characterizes the speed of loss of the information about the 
initial state of the system, and hence what is called the “chaos depth”. In the definition of K, the sum 
over m is replaced with the summation over all possible permutations {m} = mo, m, ..., my; of small 
space regions, and W,, is replaced with W;,,,, the probability of finding the system in the corresponding 
regions m at time moment ¢,,, with ¢,, = mz, in the limit t > 0, with Nr= const. For chaos in the simplest 
objects, 1D maps, K is equal to the Lyapunov exponent 4 > 0.7’ For systems of higher dimensionality, 
which are characterized by several Lyapunov exponents 4, the Kolmogorov entropy is equal to the 
phase-space average of the sum of all positive 2. These facts provide a much more practicable way of 
(typically, numerical) calculation of the Kolmogorov entropy than the direct use of its definition.?8 


2.3. Maxwell’s Demon, information, and computation 


Before proceeding to other statistical distributions, I would like to make a detour to address one 
more popular concern about Eq. (24) — the direct relation between entropy and information. Some 
physicists are still uneasy with entropy being nothing else than the (deficit of) information, though to the 
best of my knowledge, nobody has yet been able to suggest any experimentally verifiable difference 
between these two notions. Let me give one example of their direct relation.2? Consider a cylinder 
containing just one molecule (considered as a point particle), and separated into two halves by a 
movable partition with a door that may be opened and closed at will, at no energy cost — see Fig. 4a. If 
the door is open and the system is in thermodynamic equilibrium, we do not know on which side of the 
partition the molecule is. Here the disorder, i.e. the entropy has the largest value, and there is no way to 
get, from a large ensemble of such systems in equilibrium, any useful mechanical energy. 


25 The reader is strongly urged to solve Problem 2, whose task is to do a similar calculation for another key (“two- 
level’) physical system, and compare the results. 

26 See, e.g., CM Chapter 9 and literature therein. 

27 For the definition of A, see, e.g., CM Eq. (9.9). 

28 For more discussion, see, e.g., either Sec. 6.2 of the monograph H. G. Schuster and W. Just, Deterministic 
Chaos, 4" ed., Wiley-VHS, 2005, or the monograph by Armold and Avez, cited in Sec. 1. 

29 This system is frequently called the Szilard engine, after L. Szilard who published its detailed theoretical 
discussion in 1929, but is essentially a straightforward extension of the thought experiment suggested by J. 
Maxwell as early as 1867. 
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(a) (b) (c) 
a g | 
| a, 


Fig. 2.4. The Szilard engine: a cylinder with a single molecule and a movable partition: (a) before 
and (b) after closing the door, and (c) after opening the door at the end of the expansion stage. 


Now, let us consider that we know (as instructed by, in Lord Kelvin’s formulation, an omniscient 
Maxwell’s Demon) on which side of the partition the molecule is currently located. Then we may close 
the door, trapping the molecule, so that its repeated impacts on the partition create, on average, a 
pressure force FY directed toward the empty part of the volume (in Fig. 4b, the right one). Now we can 
get from the molecule some mechanical work, say by allowing the force Fto move the partition to the 
right, and picking up the resulting mechanical energy by some deterministic (zero-entropy) external 
mechanism. After the partition has been moved to the right end of the volume, we can open the door 
again (Fig. 4c), equalizing the molecule’s average pressure on both sides of the partition, and then 
slowly move the partition back to the middle of the volume — without its resistance, i.e. without doing 
any substantial work. With the continuing help by the Maxwell’s Demon, we can repeat the cycle again 
and again, and hence make the system perform unlimited mechanical work, fed “only” by the 
molecule’s thermal motion, and the information about its position — thus implementing the perpetual 
motion machine of the 2™ kind — see Sec. 1.6. The fact that such heat engines do not exist means that 
getting any new information, at non-zero temperature (i.e. at a substantial thermal agitation of particles) 
has a non-zero energy cost. 


In order to evaluate this cost, let us calculate the maximum work per cycle that can be made by 
the Szilard engine (Fig. 4), assuming that it is constantly in the thermal equilibrium with a heat bath of 
temperature 7. Formula. (21) tells us that the information supplied by the demon (on what exactly half 
of the volume contains the molecule) is exactly one bit, J (2) = 1. According to Eq. (24), this means that 
by getting this information we are changing the entropy of our system by 


AS, =—In2. (2.50) 


Now, it would be a mistake to plug this (negative) entropy change into Eq. (1.19). First, that relation is 
only valid for slow, reversible processes. Moreover (and more importantly), this equation, as well as its 
irreversible version (1.41), is only valid for a fixed statistical ensemble. The change AS; does not belong 
to this category and may be formally described by the change of the statistical ensemble — from the one 
consisting of all similar systems (experiments) with an unknown location of the molecule, to a new 
ensemble consisting of the systems with the molecule in its certain (in Fig. 4, left) half.?° 


Now let us consider a slow expansion of the “gas” after the door had been closed. At this stage, 
we do not need the Demon’s help any longer (i.e. the statistical ensemble may be fixed), and can indeed 
use the relation (1.19). At the assumed isothermal conditions (7 = const), this relation may be integrated 


30 This procedure of the statistical ensemble re-definition is the central point of the connection between physics 
and information theory, and is crucial in particular for any (or rather any meaningful :-) discussion of 
measurements in quantum mechanics — see, e.g., QM Secs. 2.5 and 10.1. 


Chapter 2 Page 14 of 44 


Irreversible 
computation: 
energy cost 


Essential Graduate Physics SM: Statistical Mechanics 


over the whole expansion process, getting AO = TAS. At the final position shown in Fig. 4c, the 
system’s entropy should be the same as initially, i.e. before the door had been opened, because we again 
do not know where in the volume the molecule is. This means that the entropy was replenished, during 
the reversible expansion, from the heat bath, by AS = —AS; = +1n2, so that AO = TAS = 71n2. Since by 
the end of the whole cycle the internal energy EF of the system is the same as before, all this heat could 
have gone into the mechanical energy obtained during the expansion. Thus the maximum obtained work 
per cycle (i.e. for each obtained information bit) is 7In2 (kg7xin2 in the SI units), about 4x107! Joule at 
room temperature. This is exactly the energy cost of getting one bit of new information about a system at 
temperature 7. The smallness of that amount on the everyday human scale has left the Szilard engine an 
academic theoretical exercise for almost a century. However, recently several such devices, of various 
physical nature, were implemented experimentally (with the Demon’s role played by an instrument 
measuring the position of the particle without a substantial effect on its motion), and the relation AQ = 
Tln2 was proved, with a gradually increasing precision.! 


Actually, discussion of another issue closely related to Maxwell’s Demon, namely of energy 
consumption at numerical calculations, was started earlier, in the 1960s. It was motivated by the 
exponential (Moore ’s-law) progress of the digital integrated circuits, which has led in particular, to a 
fast reduction of the energy AE “spent” (turned into heat) per one binary logic operation. In the recent 
generations of semiconductor digital integrated circuits, the typical AE is still above 10°” J, ice. still 
exceeds the room-temperature value of 7In2 ~ 4x107! J by several orders of magnitude. Still, some 
engineers believe that thermodynamics imposes this important lower limit on AF and hence presents an 
insurmountable obstacle to the future progress of computation. Unfortunately, in the 2000s this delusion 
resulted in a substantial and unjustified shift of electron device research resources toward using “non- 
charge degrees of freedom” such as spin (as if they do not obey the general laws of statistical physics!), 
so that the issue deserves at least a brief discussion. 


Let me believe that the reader of these notes understands that, in contrast to naive popular talk, 
computers do not create any new information; all they can do is reshaping (“processing”) the input 
information, /osing most of it on the go. Indeed, any digital computation algorithm may be decomposed 
into simple, binary logical operations, each of them performed by a circuit called the /ogic gate. Some of 
these gates (e.g., the logical NOT performed by inverters, as well as memory READ and WRITE 
operations) do not change the amount of information in the computer. On the other hand, such 
information-irreversible logic gates as two-input NAND (or NOR, or XOR, etc.) erase one bit at each 
operation, because they turn two input bits into one output bit — see Fig. 5a. 


In 1961, Rolf Landauer argued that each logic operation should turn into heat at least energy 


AE. =TIn2=k,T, In2. (2.51) 


min 


This result may be illustrated with the Szilard engine (Fig. 4), operated in a reversed cycle. At the first 
stage, with the door closed, it uses external mechanical work AE = 71n2 to reduce the volume in that the 
molecule is confined, from V to V/2, pumping heat AQ = AE into the heat bath. To model a logically 
irreversible logic gate, let us now open the door in the partition, and thus lose one bit of information 
about the molecule’s position. Then we will never get the work 71n2 back, because moving the partition 


3! See, for example, A. Bérut et al., Nature 483, 187 (2012); J. Koski et al., PNAS USA 111, 13786 (2014); Y. Jun 
et al., Phys. Rev. Lett. 113, 190601 (2014); J. Peterson et al., Proc. Roy. Soc. A 472, 20150813 (2016). 
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back to the right, with the door open, takes place at zero average pressure. Hence, Eq. (51) gives a 
fundamental limit for energy loss (per bit) at the logically irreversible computation. 


(b) 


Fig. 2.5. Simple examples 
of (a) irreversible and (b) 
potentially reversible logic 
circuits. Each rectangle 
denotes a circuit storing one 
bit of information. 


However, in 1973 Charles Bennett came up with convincing arguments that it is possible to 
avoid such energy loss by using only operations that are reversible not only physically, but also 
logically.32 For that, one has to avoid any loss of information, i.e. any erasure of intermediate results, for 
example in the way shown in Fig. 5b.?3 At the end of all calculations, after the result has been copied 
into memory, the intermediate results may be “rolled back” through reversible gates to be eventually 
merged into a copy of input data, again without erasing a single bit. The minimal energy dissipation at 
such reversible calculation tends to zero as the operation speed is decreased, so that the average energy 
loss per bit may be less than the perceived “fundamental thermodynamic limit” (51). The price to pay 
for this ultralow dissipation is a very high complexity of the hardware necessary for the storage of all 
intermediate results. However, using irreversible gates sparsely, it may be possible to reduce the 
complexity dramatically, so that in the future such mostly reversible computation may be able to reduce 
energy consumption in practical digital electronics.*4 


Before we leave Maxwell’s Demon behind, let me use it to revisit, for one more time, the 
relation between the reversibility of the classical and quantum mechanics of Hamiltonian systems and 
the irreversibility possible in thermodynamics and statistical physics. In the gedanken experiment shown 
in Fig. 4, the laws of mechanics governing the motion of the molecule are reversible at all times. Still, at 
partition’s motion to the right, driven by molecular impacts, the entropy grows, because the molecule 
picks up the heat AQ > 0, and hence the entropy AS = AQ/T > 0, from the heat bath. The physical 
mechanism of this irreversible entropy (read: disorder) growth is the interaction of the molecule with 
uncontrollable components of the heat bath, and the resulting loss of information about the motion of the 
molecule. Philosophically, such emergence of irreversibility in large systems is a strong argument 
against reductionism — a naive belief that knowing the exact laws of Nature at the lowest, most 
fundamental level of its complexity, we can readily understand all phenomena on the higher levels of its 


32 C, Bennett, JBM J. Res. Devel. 17, 525 (1973); see also C. Bennett, Int. J. Theor. Phys. 21, 905 (1982). 

33 For that, all gates have to be physically reversible, with no static power consumption. Such logic devices do 
exist, though they are still not very practicable — see, e.g., K. Likharev, Int. J. Theor. Phys. 21, 311 (1982). 
(Another reason for citing, rather reluctantly, my own paper is that it also gave constructive proof that the 
reversible computation may also beat the perceived “fundamental quantum limit”, AEAt > hi, where At is the time 
of the binary logic operation.) 

34 Many currently explored schemes of guantum computing are also reversible — see, e.g., QM Sec. 8.5 and 
references therein. 
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organization. In reality, the macroscopic irreversibility of large systems is a good example*> of a new 
law (in this case, the 2" law of thermodynamics) that becomes relevant on a substantially new, higher 
level of complexity — without defying the lower-level laws. Without such new laws, very little of the 
higher-level organization of Nature may be understood. 


2.4. Canonical ensemble and the Gibbs distribution 


As was shown in Sec. 2 (see also a few problems of the list given in the end of this chapter), the 
microcanonical distribution may be directly used for solving some simple problems. However, its 
further development, also due to J. Gibbs, turns out to be much more convenient for calculations. 


Let us consider a statistical ensemble of macroscopically similar systems, each in thermal 
equilibrium with a heat bath of the same temperature T (Fig. 6a). Such an ensemble is called canonical. 


(a) (b) 


system E. AEs 
under study . Sg 


pee 


Fup = Ex — En Fig. 2.6. (a) A system in a heat 


haa bath bath (i.e. a canonical ensemble’s 


E member) and (b) the energy 
Eup, T spectrum of the composite system 
0 (including the heat bath). 


It is intuitively evident that if the heat bath is sufficiently large, any thermodynamic variables 
characterizing the system under study should not depend on the heat bath’s environment. In particular, 
we may assume that the heat bath is thermally insulated, so that the total energy Ey of the composite 
system, consisting of the system of our interest plus the heat bath, does not change in time. For example, 
if the system of our interest is in a certain (say, m'" ) quantum state, then the sum 


E, =E,, +E wp (2.52) 


is time-independent. Now let us partition the considered canonical ensemble of such systems into much 
smaller sub-ensembles, each being a microcanonical ensemble of composite systems whose total, time- 
independent energies Ey are the same — as was discussed in Sec. 2, within a certain small energy interval 
AEs << Es — see Fig. 6b. Due to the very large size of each heat bath in comparison with that of the 
system under study, the heat bath’s density of states gup is very high, and AF may be selected so that 


1 
—— << AE, <<| ho =| << E, 


§ up 


where m and m’ are any states of the system of our interest. 


ae (2.53) 


35 Another famous example is Charles Darwin’s theory of biological evolution. 
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According to the microcanonical distribution, the probabilities to find the composite system, 
within each of these microcanonical sub-ensembles, in any state are equal. Still, the heat bath energies 
Eup = Ey — E\, (Fig. 6b) of the members of this sub-ensemble may be different — due to the difference in 
En. The probability W(E,,,) to find the system of our interest (within the selected sub-ensemble) in a state 
with energy E£,, is proportional to the number AM of the corresponding heat baths in the sub-ensemble. 
As Fig. 6b shows, in this case we may write AM = gyp(Eyp)AEs. As a result, within the microcanonical 
sub-ensemble with the total energy Es, 

W © AM = gy_ (Figg AES = Sup (4, — Z,, JAE, - (2.54) 


m 


Let us simplify this expression further, using the Taylor expansion with respect to relatively 
small E,, << Ex. However, here we should be careful. As we have seen in Sec. 2, the density of states of 
a large system is an extremely fast growing function of energy, so that if we applied the Taylor 
expansion directly to Eq. (54), the Taylor series would converge for very small £,, only. A much 
broader applicability range may be obtained by taking logarithms of both parts of Eq. (54) first: 


InW,, = const + In| gun (4, —£,, y+ In AE, =const+ S,,(E; —£,,), (2.55) 


where the last equality results from the application of Eq. (36) to the heat bath, and In AZy has been 
incorporated into the (inconsequential) constant. Now, we can Taylor-expand the (much more smooth) 
function of energy on the right-hand side, and limit ourselves to the two leading terms of the series: 


InW,, ~ const + Sy, 


—_ EE. 2.56 
E_,=0 dE wp E =O7~m ( ) 


But according to Eq. (1.9), the derivative participating in this expression is nothing else than the 
reciprocal temperature of the heat bath, which (due to the large bath size) does not depend on whether 
Ey, 1s equal to zero or not. Since our system of interest is in the thermal equilibrium with the bath, this is 
also the temperature T of the system — see Eq. (1.8). Hence Eq. (56) is merely 


m 


E 
InW, = oe (2.57) 


This equality describes a substantial decrease of W,, as E;, is increased by ~7, and hence our linear 
approximation (56) is virtually exact as soon as Eyp is much larger than 7 — the condition that is rather 
easy to satisfy, because as we have seen in Sec. 2, the average energy per one degree of freedom of the 
system of the heat bath is also of the order of 7, so that its total energy is much larger because of its 
much larger size. 


Now we should be careful again because so far Eq. (57) was only derived for a sub-ensemble 
with a certain fixed Es. However, since the second term on the right-hand side of Eq. (57) includes only 
E,, and T, which are independent of Fs, this relation, perhaps with different constant terms, is valid for 
all sub-ensembles of the canonical ensemble, and hence for that ensemble as the whole. Hence for the 
total probability to find our system of interest in a state with energy E,,, in the canonical ensemble with 
temperature 7, we can write 


E 1 E 
W_ =const x ex mL =—ex mt. 2.58 
if r 7 = r r: ! (2.58) 
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This is the famous Gibbs distribution,*© sometimes called the “canonical distribution”, which is 
arguably the summit of statistical physics,*” because it may be used for a straightforward (or at least 
conceptually straightforward :-) calculation of all statistical and thermodynamic variables of a vast range 
of systems. 


Before illustrating this, let us first calculate the coefficient Z participating in Eq. (58) for the 
general case. Requiring, per Eq. (4), the sum of all W,,, to be equal 1, we get 


Z= Yeso|- 


m 


En, }, (2.59) 
T 

where the summation is formally extended to all quantum states of the system, though in practical 
calculations, the sum may be truncated to include only the states that are noticeably occupied. The 
apparently humble normalization coefficient Z turns out to be so important for applications that it has a 
special name — or actually, two names: either the statistical sum or the partition function of the system. 
To appreciate the importance of Z, let us use the general expression (29) for entropy to calculate it for 
the particular case of the canonical ensemble, i.e. the Gibbs distribution (58) of the probabilities VW,,: 


iz E 1 E 
c= hw. OV a 2 ea 2.60 
YW, nW,, Zo r | oe m r =| (2.60) 


m m 


On the other hand, according to the general rule (7), the thermodynamic (i.e. ensemble-averaged) value 
E of the internal energy of the system is 


E= LWnEn = aus exp|- = \ (2.61a) 


so that the second term on the right-hand side of Eq. (60) is just E/T, while the first term equals InZ, due 
to Eq. (59). (By the way, using the notion of reciprocal temperature P = 1/T, with the account of Eq. 
(59), Eq. (61a) may be also rewritten as 


Eu- O(In Z) 
op 
This formula is very convenient for calculations if our prime interest is the average internal energy E 


rather than F or W,,.) With these substitutions, Eq. (60) yields a very simple relation between the 
statistical sum and the entropy of the system: 


(2.61b) 


S ==+InZ. (2.62) 


Now using Eq. (1.33), we see that Eq. (62) gives a straightforward way to calculate the free 
energy F of the system from nothing other than its statistical sum (and temperature): 


36 The temperature dependence of the type exp{-const/T}, especially when showing up in rates of certain events, 
e.g., chemical reactions, is also frequently called the Arrhenius law — after chemist S. Arrhenius who has noticed 
this law in numerous experimental data. In all cases I am aware of, the Gibbs distribution is the underlying reason 
of the Arrhenius law. (We will see several examples of that later in this course.) 

37 This is the opinion of many physicists, including Richard Feynman — who climbs on this “summit” already on 
the first page of his brilliant book Statistical Mechanics, CRC Press, 1998. (This is a collection of lectures on a 
few diverse, mostly advanced topics of statistical physics, rather than its systematic course, so that it can hardly be 
used as the first textbook on the subject. However, I can highly recommend its first chapter to all my readers.) 
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F =E-TS =-TInZ. (2.63) 


The relations (61b) and (63) play the key role in the connection of statistics to thermodynamics, 
because they enable the calculation, from Z alone, of the thermodynamic potentials of the system in 
equilibrium, and hence of all other variables of interest, using the general thermodynamic relations — see 
especially the circular diagram shown in Fig. 1.6, and its discussion in Sec. 1.4. Let me only note that to 
calculate the pressure P, e.g., from the second of Eqs. (1.35), we would need to know the explicit 
dependence of F, and hence of the statistical sum Z on the system’s volume V. This would require the 
calculation, by appropriate methods of either classical or quantum mechanics, of the dependence of the 
eigenenergies E,,, on the volume. Numerous examples of such calculations will be given later in the 
course. 


Before proceeding to first such examples, let us notice that Eqs. (59) and (63) may be readily 
combined to give an elegant equality, 


exp|- * => exp|- te (2.64) 


m 


This equality, together with Eq. (59), enables us to rewrite the Gibbs distribution (58) in another form: 


F-E 
W = exp T = \, (2.65) 


more convenient for some applications. In particular, this expression shows that since all probabilities 
W,, are below 1, F is always lower than the lowest energy level. Also, Eq. (65) clearly shows that the 
probabilities W,,, do not depend on the energy reference, 1. e. on an arbitrary constant added to all E,, — 
and hence to E and F. 


2.5. Harmonic oscillator statistics 


The last property may be immediately used in our first example of the Gibbs distribution 
application to a particular, but very important system — the harmonic oscillator, for a much more general 
case than was done in Sec. 2, namely for an arbitrary relation between T and fia.** Let us consider a 
canonical ensemble of similar oscillators, each in a contact with a heat bath of temperature 7. Selecting 
the ground-state energy fiw/2 for the origin of E, the oscillator eigenenergies (38) become E,,, = mha 
(with m= 0, 1,...), so that the Gibbs distribution (58) for probabilities of these states is 


1 E 1 mho 
W,, =— exp, -—* > = — exp, -——,, 2.66 
ef Lala fel ae 
with the following statistical sum: 
i= Y exp} me | = ee where A= exp|- a IP (2.67) 
m=0 m=0 


This is just the well-known infinite geometric progression (the “geometric series’’),?° with the sum 


38 The task of making a similar (and even simpler) calculation for another key quantum-mechanical object, the 
two-level system, is left for the reader’s exercise. 
39 See, e.g., MA Eq. (2.8b). 
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(2.68) 
Quantum . 
oscillator: So that Eq. (66) yields 
statistics 
W,, -~ (1 eRo!T |e-mhool (2.69) 


Figure 7a shows VW, for several lower energy levels, as functions of temperature, or rather of the 


T/ho ratio. The plots show that the probability to find the oscillator in each particular state (except for 
the ground one, with m = 0) vanishes in both low- and high-temperature limits, and reaches its 


maximum value W,,, ~ 0.3/m at T ~ mha, so that the contribution mh@W,, of each excited level to the 
average oscillator energy E is always smaller than ha. 


(a) : (b) 
Wi E 
ho 
W, 1 
W, 
0.1 3 0 
fF 
; ho 
0.01 =2 
0.1 1 10 0 0.5 1 1.5 2 25 3 
Tlho Tlho 


Fig. 2.7. Statistical and thermodynamic parameters of a harmonic oscillator, as functions of temperature. 


This average energy may be calculated in either of two ways: either using Eq. (61a) directly: 


E=> E,W, = f)— eho r VS mhove mo! a (2.70) 
m=0 m=0 
or (simpler) using Eq. (61b), as 
0 a) 


E= “ae = ap In(l1—exp{- Gho}), where B= 


= 


: C71 


Both methods give (of course) the same result,4° 


40 Tt was first obtained in 1924 by S. Bose and is sometimes called the Bose distribution — a particular case of the 
Bose-Einstein distribution to be discussed in Sec. 8 below. 
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(2.72) 


which is valid for arbitrary temperature and plays a key role in many fundamental problems of physics. 
The red line in Fig. 7b shows this result as a function of the normalized temperature. At relatively low 
temperatures, T << ha, the oscillator is predominantly in its lowest (ground) state, and its energy (on top 
of the constant zero-point energy fa@/2, which was used in our calculation as the reference) is 
exponentially small: E ~ h@ exp {-ha/T} << T, ha. On the other hand, in the high-temperature limit, the 
energy tends to 7. This is exactly the result (a particular case of the equipartition theorem) that was 
obtained in Sec. 2 from the microcanonical distribution. Please note how much simpler is the calculation 
using the Gibbs distribution, even for an arbitrary ratio T/ha. 


To complete the discussion of the thermodynamic properties of the harmonic oscillator, we can 
calculate its free energy using Eq. (63): 
F=TinZ=Tin(l—e%"7 ) (2.73) 
Now the entropy may be found from thermodynamics: either from the first of Eqs. (1.35), S = —(OF/0T)y, 
or (even more easily) from Eq. (1.33): S = (£ — F)/T. Both relations give, of course, the same result: 


_ ho 1 —ho/T 
S =O art lille ) (2.74) 
Finally, since in the general case the dependence of the oscillator properties (essentially, of @) on 
volume V is not specified, such variables as P, 44 G, W, and Q are not defined, and what remains is to 
calculate the average heat capacity C per one oscillator: 


28 (Mey ehO/T =—T hol2T 7) 
Ol Ay het aiy ~ | sinh(ha/2T) | | 


(2.75) 


The calculated thermodynamic variables are plotted in Fig. 7b. In the low-temperature limit (7 
<< ho), they all tend to zero. On the other hand, in the high-temperature limit (T >> ha), F > —-T 
In(T/h@)—> —«, S > |n(T/h@) > +00, and C > 1 (in the SI units, C — kg). Note that the last limit is the 
direct corollary of the equipartition theorem: each of the two “half-degrees of freedom” of the oscillator 
gives, in the classical limit, the same contribution C = 2 into its heat capacity. 


Now let us use Eq. (69) to discuss the statistics of the quantum oscillator described by 
Hamiltonian (46), in the coordinate representation. Again using the density matrix’ diagonality in 
thermodynamic equilibrium, we may use a relation similar to Eqs. (47) to calculate the probability 
density to find the oscillator at coordinate gq: 

60) = Wy) = Hal Vnl Dl = (eT) eT yay, 


m=0 m=0 m=0 


(2.76) 


where y,,(qg) is the normalized eigenfunction of the m" stationary state of the oscillator. Since each 
Wn(q) is proportional to the Hermite polynomial*! that requires at least m elementary functions for its 


4] See, e.g., QM Sec. 2.10. 
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representation, working out the sum in Eq. (76) is a bit tricky,” but the final result is rather simple: w(q) 
is just a normalized Gaussian distribution (the “bell curve’), 


wi, ot q 399 
w(q) aren oH, (2.77) 


with (q) = 0, and 


a ee ho 
(q )=(&) == ooth a (2.78) 


Since the function cothé tends to 1 at € — oo, and diverges as 1/é at € — 0, Eq. (78) shows that the 
width dq of the coordinate distribution is nearly constant (and equal to that, (n/2ma)'”, of the ground- 
state wavefunction yo) at T<< ho, and grows as (Tima@)'? = (T/n)'” at Thho > ©. 


As a sanity check, we may use Eq. (78) to write the following expression, 


i (‘= _ho ho oe for T <<ho, 


c > (2.79) 
2 4 2T 7/2, for ha <<T, 


for the average potential energy of the oscillator. To comprehend this result, let us recall that Eq. (72) 


for the average full energy E was obtained by counting it from the ground state energy fia/2 of the 
oscillator. If we add this reference energy to that result, we get 


BO OD FO og Be (2.80) 


E= + 
eho/T _1) 2 a oT 


We see that for arbitrary temperature, U = E/2, as was already discussed in Sec. 2. This means that the 
average kinetic energy, equal to E — U, is also the same:*3 


2 2 
BE ees (2.81) 
2m a) 2 4 OF 


In the classical limit 7 >> a, both energies equal 7/2, reproducing the equipartition theorem result (48). 


2.6. Two important applications 


The results of the previous section, especially Eq. (72), have innumerable applications in physics 
and related disciplines, but here I have time for a brief discussion of only two of them. 


(i) Blackbody radiation. Let us consider a free-space volume V limited by non-absorbing (i.e. 
ideally reflecting) walls. Electrodynamics tells us*4 that the electromagnetic field in such a “cavity” may 
be represented as a sum of “modes” with the time evolution similar to that of the usual harmonic 


42 The calculation may be found, e.g., in QM Sec. 7.2. 
43 As a reminder: the equality of these two averages, at arbitrary temperature, was proved already in Sec. 2. 
44 See, e.g., EM Sec. 7.8. 
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oscillator. If the volume V is large enough,*> the number of these modes within a small range dk of the 

wavevector magnitude k is 

_ eV ; d 3k _ gv - 
(27) (27) 


Ank’ dk , (2.82) 


where for electromagnetic waves, the degeneracy factor g is equal to 2, due to their two different 
independent (e.g., linear) polarizations of waves with the same wave vector k. With the linear, isotropic 
dispersion relation for waves in vacuum, k = a/c, Eq. (82) yields 

2 2 2 
ot yO ey ap, (2.83) 
(27) c hie & 


dN 


On the other hand, quantum mechanics says*® that the energy of such a “field oscillator” is 
quantized per Eq. (38), so that at thermal equilibrium its average energy is described by Eq. (72). 
Plugging that result into Eq. (83), we see that the spectral density of the electromagnetic field’s energy, 
per unit volume, is 


EdN ha 1 Planck's 
= _ : 2.84 radiation 
Vdo rc eholT _| oo) 


law 


u(@) 


This is the famous Planck’s blackbody radiation law.*’ To understand why its common name 
mentions radiation, let us consider a small planar part, of area dA, of a surface that completely absorbs 
electromagnetic waves incident from any direction. (Such “perfect black body” approximation may be 
closely approached using special experimental structures, especially in limited frequency intervals.) 
Figure 8 shows that if the arriving wave was planar, with the incidence angle @, then the power dP 4 @) 
absorbed by the surface of small area dA, within a small frequency interval da, i.e. the energy incident 
at that area in unit time, would be equal to the radiation energy within the same frequency interval, 
contained inside an imaginary cylinder (shaded in Fig. 8) of height c, base area dAcos@, and hence 
volume dV = c dAcos@: 


dd,(a@) =u(@)dadV =u(@)dac dAcosé. (2.85) 


Fig. 2.8. Calculating the relation 
between dY (@) and u(a)da. 


45 In our current context, the volume should be much larger than (ch/T)’, where c ~ 3x10* m/s is the speed of 
light. For the room temperature (T ¥ kgx300K ~ 4x107' J), this lower bound is of the order of 10°'° m’, 

46 See, e.g., QM Sec. 9.1. 

47 Let me hope the reader knows that this law was first suggested in 1900 by Max Planck as an empirical fit for 
the experimental data on blackbody radiation, and this was the historic point at which the Planck constant A (or 
rather 4 = 27h) was introduced — see, e.g., QM Sec. 1.1. 
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Since the thermally-induced field is isotropic, i.e. propagates equally in all directions, this result 
should be averaged over all solid angles within the polar angle interval 0 < 0< 77/2: 


dP(a) _ | j[Z@ x 


1 ; 2a Z 
dQ. = cu(@)— | sin@l0 [de cos = —u(a). 2.86 
dAdo 42° dAdo ue) Ar J ol J P COS Peak (2.86) 


Hence the Planck’s expression (84), multiplied by c/4, gives the power absorbed by such a “blackbody” 
surface. But at thermal equilibrium, this absorption has to be exactly balanced by the surface’s own 
radiation, due to its non-zero temperature 7. 


I hope the reader is familiar with the main features of the Planck law (84), including its general 
shape (Fig. 9), with the low-frequency asymptote u(@) « w@ (due to its historic significance bearing the 
special name of the Rayleigh-Jeans law), the exponential drop at high frequencies (the Wien law), and 
the resulting maximum of the function u(@), reached at the frequency @max with 


ho». ¥ 2.82T , (2.87) 


i.e. at the wavelength Amax = 2 a/kinax = 27C/Omax ® 2.22 ch/T. 


10, 


u(@) 
u 
. aa Fig. 2.9. The frequency dependence of the 
, blackbody radiation density, normalized by 
Up = Tir irc’, according to the Planck law 
(red line) and the Rayleigh-Jeans law (blue 
0.01 line). 
0.1 1 10 


holT 


Still, I cannot help mentioning a few important particular values: one corresponding to the 
visible light (Amax ~ 500 nm) for the Sun’s effective surface temperature Tx ~ 6,000 K, and another one 
corresponding to the mid-infrared range (Amax ~10 um) for the Earth’s surface temperature Tx ~ 300 K. 
The balance of these two radiations, absorbed and emitted by the Earth, determines its surface 
temperature and hence has the key importance for all life on our planet. This is why it is at the front and 
center of the current climate change discussions. As one more example, the cosmic microwave 
background (CMB) radiation, closely following the Planck law with 7x = 2.725 K (and hence having the 
maximum density at Amax ~ 1.9 mm), and in particular its (very small) anisotropy, is a major source of 
data for modern cosmology. 


Now let us calculate the total energy E of the blackbody radiation inside some volume /. It may 
be found from Eq. (84) by its integration over all frequencies: 4849 


48 The last step in Eq. (88) uses a table integral, equal to '(4)¢(4) = (3!)(*/90) = z*/15 — see, e.g., MA Eq. (6.8b), 
with s = 4, and then MA Eqs. (6.7e), and (2.7b). 
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r ‘ho da VI’ Pp ede ae 
= = = = ° 2. 
E V|ulo)do a wore pholT 1 whe? \-z a rue r (2.88) 


Using Eq. (86) to recast Eq. (88) into the total power radiated by a blackbody surface, we get the well- 
known Stefan (or “Stefan-Boltzmann”’) /aw>° 


a cone =O (2.89a) 


where o is the Stefan-Boltzmann constant 


n° W 


kg »5.67x10~ ag (2.89b) 


Oo =—_, 
60/°c? 


By this point, the thoughtful reader should have an important concern ready: Eq. (84) and hence 
Eq. (88) are based on Eq. (72) for the average energy of each oscillator, referred to its ground-state 
energy ia@/2. However, the radiation power should not depend on the energy origin; why have not we 
included the ground energy of each oscillator into the integration (88), as we have done in Eq. (80)? The 
answer is that usual radiation detectors only measure the difference between the power An of the 
incident radiation (say, that of a blackbody surface with temperature 7) and their own back-radiation 
power Aju, corresponding to some effective temperature Ty of the detector — see Fig. 10. But however 
low Ty is, the temperature-independent contribution i@/2 of the ground-state energy to the back 
radiation is always there. Hence, the term fi@/2 drops out from the balance, and cannot be detected — at 
least in this simple way. This is the reason why we had the right to ignore this contribution in Eq. (88) — 
very fortunately, because it would lead to the integral’s divergence at its upper limit. However, let me 
repeat that the ground-state energy of the electromagnetic field oscillators is physically real — and 
important — see Sec. 5.5 below. 


dP (@) x EC T)+ 0 Naw 


> 


A rT Fig. 2.10. The power balance at 
7 ho d the electromagnetic radiation 
dP, (a) oC E(o,T,)+ el 2) power measurement. 


One more interesting result may be deduced from the free energy F of the electromagnetic 
radiation, which may be calculated by integration of Eq. (73) over all the modes, with the appropriate 
weight (83): 


49 Note that the heat capacity C, = (GE/O7)y, following from Eq. (88), is proportional to 7° at any temperature, and 
hence does not obey the trend Cy — const at T — «. This is the result of the unlimited growth, with temperature, 
of the number of thermally-exited field oscillators with frequencies @ below T/h. 

50 Its functional part (E « T*) was deduced in 1879 by Joseph Stefan from earlier experiments by John Tyndall. 
Theoretically, it was proved in 1884 by L. Boltzmann, using a result derived earlier by Adolfo Bartoli from the 
Maxwell equations for the electromagnetic field — all well before Max Planck’s work. 
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F=S Tinh? |_, frin( =e halt Fae 7 [rin( - aa 2 Je . (2.90) 
oO 0 0 


Representing wd as d(@’)/3, we can readily work out this integral by parts, reducing it to a table 
integral similar to that in Eq. (88), and getting a surprisingly simple result: 
2 
1 4 E 
F=-V T’= ‘ 2.91 
A45nPc* 3 an 

Now we can use the second of the general thermodynamic relations (1.35) to calculate the pressure 
exerted by the radiation on the walls of the containing volume V:>! 


2 
p-{&) i , (2.92a) 
OV), 45h c 3V 
Rewritten in the form, 
PV = > (2.92b) 


this result may be considered as the equation of state of the electromagnetic field, i.e. from the quantum- 
mechanical point of view, of the photon gas. Note that the equation of state (1.44) of the ideal classical 
gas may be represented in a similar form, but with a coefficient generally different from Eq. (92). 
Indeed, according to the equipartition theorem, for an ideal gas of non-relativistic particles whose 
internal degrees of freedom are in a fixed (say, ground) state, the temperature-dependent energy is that 
of the three translational “half-degrees of freedom”, E = 3N(7/2). Expressing from here the product NT 
= (2E/3), and plugging it into Eq. (1.44), we get a relation similar to Eq. (92), but with a twice larger 
factor before E. On the other hand, a relativistic treatment of the classical gas shows that Eq. (92) is 
valid for any gas in the ultra-relativistic limit, 7 >> mc’, where m is the rest mass of the gas’ particle. 
Evidently, photons (i.e. particles with m = 0) satisfy this condition at any energy.°2 


Finally, let me note that Eq. (92) allows for the following interesting interpretation. The last of 
Eqs. (1.60), being applied to Eq. (92), shows that in this particular case the grand thermodynamic 
potential Q equals (—E/3), so that according to Eq. (91), it is equal to F. But according to the definition 
of Q, i.e. the first of Eqs. (1.60), this means that the chemical potential of the electromagnetic field 
excitations (photons) vanishes: 
N 


In Sec. 8 below, we will see that the same result follows from the comparison of Eq. (72) and the 
general Bose-Einstein distribution for arbitrary bosons. So, from the statistical point of view, photons 
may be considered as bosons with zero chemical potential. 


0. (2.93) 


(ii) Specific heat of solids. The heat capacity of solids is readily measurable, and in the early 
1900s, its experimentally observed temperature dependence served as an important test for the then- 


5! This formula may be also derived from the expression for the forces exerted by the electromagnetic radiation on 
the walls (see, e.g. EM Sec. 9.8), but the above calculation is much simpler. 

52 Note that according to Eqs. (1.44), (88), and (92), the difference between the equations of state of the photon 
gas and an ideal gas of non-relativistic particles, expressed in the more usual form P = P(V, 7), is much more 
dramatic: P « T*V° vs. Px T'V". 
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emerging quantum theories. However, the theoretical calculation of Cy is not simple*? — even for 
insulators, whose specific heat at realistic temperatures is due to thermally-induced vibrations of their 
crystal lattice alone.>+ Indeed, at relatively low frequencies, a solid may be treated as an elastic 
continuum. Such a continuum supports three different modes of mechanical waves with the same 
frequency @, that all obey linear dispersion laws, w= vk, but the velocity v = v, for one of these modes 
(the longitudinal sound) is higher than that (v,) of two other modes (the transverse sound).°> At such 
frequencies, the wave mode density may be described by an evident generalization of Eq. (83): 


1 | ee: 
dN =V ——| —+— |4n0°do. (2.94a) 
(27) \y, Y, 
For what follows, it is convenient to rewrite this relation in a form similar to Eq. (83): 
> -1/3 
de gg OO” Wigs| Ee) (2.94b) 
(27) v 3 voy 


However, the basic wave theory shows* that as the frequency @ of a sound wave in a periodic 
structure is increased so that its half-wavelength z/k approaches the crystal period d, the dispersion law 
ak) becomes nonlinear before the frequency reaches its maximum at k = z/d. To make things even 
more complex, 3D crystals are generally anisotropic, so that the dispersion law is different in different 
directions of the wave propagation. As a result, the exact statistics of thermally excited sound waves, 
and hence the heat capacity of crystals, is rather complex and specific for each particular crystal type. 


In 1912, P. Debye suggested an approximate theory of the specific heat’s temperature 
dependence, which is in a surprisingly good agreement with experiment for many insulators, including 
polycrystalline and amorphous materials. In his model, the linear (acoustic) dispersion law @ = vk, with 
the effective sound velocity v defined by the second of Eqs. (94b), is assumed to be exact all the way up 
to some cutoff frequency @p, the same for all three wave modes. This Debye frequency may be defined 
by the requirement that the total number of acoustic modes, calculated within this model from Eq. (94b), 

QO 

1 D V 3 

N=V 4 [4no*do =— 22, (2.95) 
(22) v4 2m°v 

is equal to the universal number N = 3nV of the degrees of freedom (and hence of independent 
oscillation modes) in a 3D system of nV elastically coupled particles, where n is the atomic density of 
the crystal, i.e. the number of atoms per unit volume.%’ For this model, Eq. (72) immediately yields the 
following expression for the average energy and specific heat (in thermal equilibrium at temperature T ): 

api 8 { ho 
(27)° v° ‘ gho/T ee 


Ano’ deo = 3nVT D(x)x=7, iT (2.96) 


53 Due to a rather low temperature expansion of solids, the difference between their Cy and Cp is small. 

54 In good conductors (e.g., metals), specific heat is contributed (and at low temperatures, dominated) by free 
electrons — see Sec. 3.3 below. 

55 See, e.g., CM Sec. 7.7. 

56 See, e.g., CM Sec. 6.3, in particular Fig. 6.5 and its discussion. 

57 See, e.g., CM Sec. 6.2. 
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_ _ _aD(x) 
or) 73| 209) — i (2.97) 


where 7p = h@pis called the Debye temperature,>* and 


a 1, for x > 0, 
eas aa ee ; (2.98) 
oe 4 a’ /5x°, for x 30, 


C1 (* 


V 


oe ae 


D(x) = =| 


is the Debye function. Red lines in Fig. 11 show the temperature dependence of the specific heat cy (per 
particle) within the Debye model. At high temperatures, it approaches a constant value of three, 
corresponding to the energy E = 3nVT, in agreement with the equipartition theorem for each of three 
degrees of freedom (i.e. six half-degrees of freedom) of each mode. (This value of cy is known as the 
Dulong-Petit law.) In the opposite limit of low temperatures, the specific heat is much smaller: 


4 3 
Cy ieee (=) 221, (2.99) 


5 47, 


D 


reflecting the reduction of the number of excited phonons with iiw< T as the temperature is decreased. 


ty c 0.1 


0 0.5 1 1.5 0.01 0.1 1 
TE Tit, 


Fig. 2.11. The specific heat as a function of temperature in the Debye (red lines) and Einstein (blue lines) models. 


As a historic curiosity, P. Debye’s work followed one by A. Einstein, who had suggested (in 
1907) a simpler model of crystal vibrations. In his model, all 3nV independent oscillatory modes of nV 
atoms of the crystal have approximately the same frequency, say @p, and Eq. (72) immediately yields 


ho, 


oh@,/T 1 


E=3nV (2.100) 


so that the specific heat is functionally similar to Eq. (75): 


58 In the SI units, the Debye temperature 7p is of the order of a few hundred K for most simple solids (e.g., ~430 
K for aluminum and ~340 K for copper), with somewhat lower values for crystals with heavy atoms (~105 K for 
lead), and reaches its highest value ~2200 K for diamond, with its relatively light atoms and very stiff lattice. 
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Ill 


2 
eae (=) -3| auaste* | (2.101) 
nV \OT ), sinh(i@, /2T) 

This dependence c,{7) is shown with blue lines in Fig. 11 (assuming, for the sake of simplicity, 
that i@g = Tp). At high temperatures, this result does satisfy the universal Dulong-Petit law (cy = 3), but 
for T << Tp, Einstein’s model predicts a much faster (exponential) drop of the specific heart as the 
temperature is reduced. (The difference between the Debye and Einstein models is not too spectacular 
on the linear scale, but in the log-log plot, shown on the right panel of Fig. 11, it is rather dramatic.>?) 
The Debye model is in a much better agreement with experimental data for simple, monoatomic 
crystals, thus confirming the conceptual correctness of his wave-based approach. 


Note, however, that when a genius such as Albert Einstein makes an error, there is usually some 
deep and important background under it. Indeed, crystals with the basic cell consisting of atoms of two 
or more types (such as NaCl, etc.), feature two or more separate branches of the dispersion law @(k) — 
see, e.g., Fig. 12. While the lower, “acoustic” branch is virtually similar to those for monoatomic 
crystals and may be approximated by the Debye model, w = vk, reasonably well, the upper (“optical’’®°) 
branch does not approach @ = 0 at any k. Moreover, for large values of the atomic mass ratio r, the 
optical branches are almost flat, with virtually k-independent frequencies @, which correspond to 
simple oscillations of each light atom between its heavy neighbors. For thermal excitations of such 
oscillations, and their contribution to the specific heat, Einstein’s model (with @g = @p) gives a very 
good approximation, so that for such solids, the specific heat may be well described by a sum of the 
Debye and Einstein laws (97) and (101), with appropriate weights. 


“optical” branch 
ok) 


(arbitrary ia) kd 


linear scale) | “acoustic” branch __| Fig. 2.12. The dispersion relation for 
mechanical waves in a simple 1D model of a 
solid, with similar interparticle distances d, but 
alternating particle masses, plotted for a 
particular mass ratio r = 5 — see CM Chapter 6. 


2.7. Grand canonical ensemble and distribution 


As we have seen, the Gibbs distribution is a very convenient way to calculate the statistical and 
thermodynamic properties of systems with a fixed number JN of particles. However, for systems in which 
N may vary, another distribution is preferable for applications. Several examples of such situations (as 


59 This is why there is the following general “rule of thumb” in quantitative sciences: if you plot your data on a 
linear rather than log scale, you better have a good excuse ready. (An example of a valid excuse: the variable you 
are plotting changes its sign within the range you want to exhibit.) 

60 This term stems from the fact that at k > 0, the mechanical waves corresponding to these branches have phase 
velocities vp, = a@(k)/k that are much higher than that of the acoustic waves, and may approach the speed of light. 
As a result, these waves can strongly interact with electromagnetic (practically, optical) waves of the same 
frequency, while acoustic waves cannot. 


Chapter 2 Page 30 of 44 


Essential Graduate Physics SM: Statistical Mechanics 


well as the basic thermodynamics of such systems) have already been discussed in Sec. 1.5. Perhaps 
even more importantly, statistical distributions for systems with variable N are also applicable to some 
ensembles of independent particles in certain single-particle states even if the number of the particles is 
fixed — see the next section. 


With this motivation, let us consider what is called the grand canonical ensemble (Fig. 13). It is 
similar to the canonical ensemble discussed in Sec. 4 (see Fig. 6) in all aspects, besides that now the 
system under study and the heat bath (in this case more often called the environment) may exchange not 
only heat but also particles. In this ensemble, all environments are in both the thermal and chemical 
equilibrium, with their temperatures 7 and chemical potentials 4 the same for all members. 


system 
under study 


environment 
Tu Fig. 2.13. A member of the grand canonical 
ensemble. 


Let us assume that the system of interest is also in the chemical and thermal equilibrium with its 
environment. Then using exactly the same arguments as in Sec. 4 (including the specification of 
microcanonical sub-ensembles with fixed Ey and Ns), we may generalize Eq. (55), taking into account 
that the entropy Seny of the environment is now a function of not only its energy Eeny = Ey — Em,y, °! but 
also of the number of particles Neny = Ns_N, with Ey and Ns fixed: 


InW,,, ©lnM =Ing,,,(Ey —E,,y,Ny —N)+InAE, = S.,, (Ey —E,,y,Nz —N)+ const 
as as (2.102) 
~ he env ae eny.. t. 
env E,,Ny OE... E,.Ny EN ON a E,.Ny N + cons 


e 


To simplify this relation, let us rewrite Eq. (1.52) in the following equivalent form: 


fee pe ye a (2.103) 
(ah ae: 


Hence, if the entropy S of a system is expressed as a function of FE, V, and N, then 


(=) af, (S) aes. (=) Sts (2.104) 
Ob en i Vo. F ON he TF 


Applying the first one and the last one of these relations to the last form of Eq. (102), and using the 
equality of the temperatures 7 and chemical potentials sz in the system under study and its environment, 
at equilibrium (as was discussed in Sec. 1.5), we get 


61 The additional index in the new notation E,,,y for the energy of the system of interest reflects the fact that its 
spectrum is generally dependent on the number N of particles in it. 
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1 
InW,,.y = Sew (E> N35) — FE nw +N + const : (2.105) 


Again, exactly as at the derivation of the Gibbs distribution in Sec. 4, we may argue that since E,,,y, T, 
and zz do not depend on the choice of environment’s size, i.e. on Ey and Ns, the probability W,,,x for a 

system to have N particles and be in m™ quantum state in the whole grand canonical ensemble should 

also obey Eq. (105). As a result, we get the so-called grand canonical distribution: 


(2.106) 


Just as in the case of the Gibbs distribution, the constant Zq (most often called the grand statistical sum, 
but sometimes the “grand partition function’) should be determined from the probability normalization 
condition, now with the summation of probabilities W,,,y over all possible values of both m and N: 


(2.107) 


Now, using the general Eq. (29) to calculate the entropy for the distribution (106) (exactly like 
we did it for the canonical ensemble), we get the following expression, 


N 
S=->W,,  0W,, x = pop titZa, (2.108) 


which is evidently a generalization of Eq. (62). We see that now the grand thermodynamic potential Q 
(rather than the free energy F’) may be expressed directly via the normalization coefficient ZG: 


Q=F-pwN)= -15-u(N)=TIn—=—T In Yexp aoe : (2.109) 


G m,N 


Finally, solving the last equality for Zg, and plugging the result back into Eq. (106), we can rewrite the 
grand canonical distribution in the form 


(2.110) 


O+ -E 
W.. = es LN m,N | 


T 


similar to Eq. (65) for the Gibbs distribution. Indeed, in the particular case when the number N of 
particles is fixed, N = (N), so that Q + uN =Q + 4KN) = F, Eq. (110) is reduced to Eq. (65). 


2.8. Systems of independent particles 


Now let us apply the general statistical distributions discussed above to a simple but very 
important case when the system we are considering consists of many similar particles whose explicit 
(“direct”) interaction is negligible. As a result, each particular energy value E,,,y of such a system may 


62 The average number of particles (N) is exactly what was called N in thermodynamics (see Chapter 1), but I 
keep this explicit notation here to make a clear distinction between this average value of the variable, and its 
particular values participating in Eqs. (102)-(110). 
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be represented as a sum of energies ¢; of the particles, where the index k numbers single-particle states 
— rather than those of the whole system, as the index m does. 


Let us start with the classical limit. In classical mechanics, the energy quantization effects are 
negligible, i.e. there is a formally infinite number of quantum states k within each finite energy interval. 
However, it is convenient to keep, for the time being, the discrete-state language, with the understanding 
that the average number ( N; ) of particles in each of these states, usually called the state occupancy, is 
very small. In this case, we may apply the Gibbs distribution to the canonical ensemble of single 
particles, and hence use it with the substitution E,,, — e,, so that Eq. (58) becomes 


(2.111) 


where the constant c should be found from the normalization condition: 


(MN, )=1. (2.112) 
k 
This is the famous Boltzmann distribution. Despite its formal similarity to the Gibbs 
distribution (58), let me emphasize the conceptual difference between these two important formulas. The 
Gibbs distribution describes the probability to find the whole system on one of its states with energy E,,, 
and it is always valid — more exactly, for a canonical ensemble of systems in thermodynamic 
equilibrium. On the other hand, the Boltzmann distribution describes the occupancy of an energy level 
of a single particle, and, as we will see in just a minute, is valid for quantum particles only in the 
classical limit (N,) << 1, even if they do not interact directly. 


The last fact may be surprising, because it may seem that as soon as particles of the system are 
independent, nothing prevents us from using the Gibbs distribution to derive Eq. (111), regardless of the 
value of ( NM; ). This is indeed true if the particles are distinguishable, i.e. may be distinguished from 
each other — say by their fixed spatial positions, or by the states of certain internal degrees of freedom 
(say, spin), or by any other “pencil mark”. However, it is an experimental fact that elementary particles 
of each particular type (say, electrons) are identical to each other, i.e. cannot be “pencil-marked”.® For 
such particles we have to be more careful: even if they do not interact explicitly, there is still some 
implicit dependence in their behavior, which is especially evident for the so-called fermions (elementary 
particles with semi-integer spin): they obey the Pauli exclusion principle that forbids two identical 
particles to be in the same quantum state, even if they do not interact explicitly.® 


63 The distribution was first suggested in 1877 by L. Boltzmann. For the particular case when ¢ is the kinetic 
energy of a free classical particle (and hence has a continuous spectrum), it is reduced to the Maxwell distribution 
(see Sec. 3.1 below), which was derived earlier — in 1860. 

64 This invites a natural question: what particles are “elementary enough” for their identity? For example, protons 
and neutrons have an internal structure, in some sense consisting of quarks and gluons; can they be considered 
elementary? Next, if protons and neutrons are elementary, are atoms? molecules? What about really large 
molecules (such as proteins)? viruses? The general answer to these questions, given by quantum mechanics (or 
rather experiment :-), is that any particles/systems, no matter how large and complex they are, are identical if they 
not only have the same internal structure but also are exactly in the same internal quantum state — for example, in 
the ground state of all their internal degrees of freedom. 

65 For a more detailed discussion of this issue, see, e.g., QM Sec. 8.1. 
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Note that the term “the same quantum state” carries a heavy meaning load here. For example, if 
two particles are confined to stay at different spatial positions (say, reliably locked in different boxes), 
they are distinguishable even if they are internally identical. Thus the Pauli principle, as well as other 
particle identity effects such as the Bose-Einstein condensation to be discussed in the next chapter, are 
important only when identical particles may move in the same spatial region. To emphasize this fact, it 
is common to use, instead of “identical”, a more precise (though grammatically rather unpleasant) 
adjective indistinguishable. 


In order to take these effects into account, let us examine statistical properties of a system of 
many non-interacting but indistinguishable particles (at the first stage of calculation, either fermions or 
bosons) in equilibrium, applying the grand canonical distribution (109) to a very unusual grand 
canonical ensemble: a subset of particles in the same quantum state k (Fig. 14). 


single-particle energy levels: 


ee ee Fig. 2.14. The grand canonical 
es Sees Se ensemble of particles in the same 
— SF SS 0 le #1 quantum state with energy G — 
——_ S§s ———_- -——__—_ — al schematically. 


particle #: 1 2 me J 


In this ensemble, the role of the environment may be played just by the set of particles in all 
other states k’ # k, because due to infinitesimal interactions, the particles may gradually change their 
states. In the resulting equilibrium, the chemical potential 2 and temperature T of the system should not 
depend on the state number k, though the grand thermodynamic potential Q; of the chosen particle 
subset may. Replacing N with N, — the particular (not average!) number of particles in the selected k" 
state, and the particular energy value FE, with «;, we reduce the final form of Eq. (109) to 


N; 
Q, =-T of Sono =-Tln [exr|# = } (2.113) 
oA i T 


N, 


where the summation should be carried out over all possible values of N;. For the final calculation of 
this sum, the elementary particle type is essential. 


On one hand, for fermions, obeying the Pauli principle, the numbers AN; in Eq. (113) may take 
only two values, either 0 (the state k is unoccupied) or 1 (the state is occupied), and the summation gives 


N, 
H-é He 
Q, =-TIn exp, ——— = —T In| 1+ expy ———> |. (2.114) 
: P27 Pl r f P| r i 


Now the state occupancy may be calculated from the last of Eqs. (1.62) — in this case, with the (average) 
N replaced with (N;,): 


(2.115) 
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This is the famous Fermi-Dirac distribution, derived in 1926 independently by Enrico Fermi and Paul 
Dirac. 


On the other hand, bosons do not obey the Pauli principle, and for them the numbers N; can take 
any non-negative integer values. In this case, Eq. (113) turns into the following equality: 


ee) =, Ny ed) = 
Q,=-Thh| few a) =-Tin S1A"*, with sexo} a. (2.116) 
N=0 r N,=0 r 
k k 


This sum is just the usual geometric series, which converges if 2 < 1, giving 


Q, = Pinte exp} for 1 <&;. (2.117) 


In this case, the average occupancy, again calculated using Eq. (1.62) with N replaced with ( N;.), obeys 
the Bose-Einstein distribution, 


(2.118) 


which was derived in 1924 by Satyendra Nath Bose (for the particular case 44 = 0) and generalized in 
1925 by Albert Einstein for an arbitrary chemical potential. In particular, comparing Eq. (118) with Eq. 
(72), we see that harmonic oscillator’s excitations,®° each with energy fia, may be considered as bosons, 
with the chemical potential equal to zero. As a reminder, we have already obtained this equality (= 0) 
in a different way — see Eq. (93). Its physical interpretation is that the oscillator excitations may be 
created inside the system, so that there is no energy cost 42 of moving them into the system under 
consideration from its environment. 


The simple form of Eqs. (115) and (118), and their similarity (besides “only” the difference of 
the signs before the unity in their denominators), is one of the most beautiful results of physics. This 
similarity, however, should not disguise the fact that the energy dependences of the occupancies (N;) 
given by these two formulas are very different — see their linear and semi-log plots in Fig. 15. 


In the Fermi-Dirac statistics, the level occupancy is not only finite, but below | at any energy, 
while in the Bose-Einstein it may be above 1, and diverges at & — uu. However, as the temperature is 
increased, it eventually becomes much larger than the difference (& — yw). In this limit, (N;) << 1, both 
quantum distributions coincide with each other, as well as with the classical Boltzmann distribution 
(111) with c=exp{y/T}: 


(N,) -rexp| “=, for (N,) 0. (2.119) 


This distribution (also shown in Fig. 15) may be, therefore, understood also as the high-temperature 
limit for indistinguishable particles of both sorts. 


66 As the reader certainly knows, for the electromagnetic field oscillators, such excitations are called photons; for 
mechanical oscillation modes, phonons. It is important, however, not to confuse these mode excitations with the 
oscillators as such, and be very careful in prescribing to them certain spatial locations — see, e.g., QM Sec. 9.1. 
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Fig. 2.15. The Fermi-Dirac (blue line), Bose-Einstein (red line), and Boltzmann (dashed line) distributions 
for indistinguishable quantum particles. (The last distribution is valid only asymptotically, at (N,) << 1.) 


A natural question now is how to find the chemical potential ~ participating in Eqs. (115), (118), 
and (119). In the grand canonical ensemble as such (Fig. 13), with the number of particles variable, the 
value of 4/ is imposed by the system’s environment. However, both the Fermi-Dirac and Bose-Einstein 
distributions are also approximately applicable (in thermal equilibrium) to systems with a fixed but very 
large number N of particles. In these conditions, the role of the environment for some subset of VN’ << N 
particles is essentially played by the remaining N — N’ particles. In this case, 42 may be found by the 
calculation of (NV) from the corresponding probability distribution, and then requiring it to be equal to 
the genuine number of particles in the system. In the next section, we will perform such calculations for 
several particular systems. 


For that and other applications, it will be convenient for us to have ready formulas for the 
entropy S of a general (i.e. not necessarily equilibrium) state of systems of independent Fermi or Bose 
particles, expressed not as a function of W,,, of the whole system, as in Eq. (29), but via the occupancy 
numbers ( N;). For that, let us consider an ensemble of composite systems, each consisting of M >> 1 
similar but distinct component systems, numbered by index m = 1, 2, ... M, with independent (i.e. not 
directly interacting) particles. We will assume that though in each of M component systems the number 
NX” of particles in their k" quantum state may be different (Fig. 16), their total number MN; in the 
composite system is fixed. As a result, the total energy of the composite system is fixed as well, 


M 
vv” =N = const, E, = Ne = Ne, = const, (2.120) 


m=1 m=1 


so that an ensemble of many such composite systems (with the same k), in equilibrium, is 
microcanonical. 
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Fig. 2.16. A composite system of N;,”? particles in the k” 
quantum state, distributed between M component systems. 


According to Eq. (24a), the average entropy S, per component system in this microcanonical 
ensemble may be calculated as 
InM, 


S, = lim yo Fa (2.121) 
where M, is the number of possible different ways such a composite system (with fixed NM) may be 
implemented. Let us start the calculation of M, for Fermi particles — for which the Pauli principle is 
valid. Here the level occupancies N;‘” may be only equal to either 0 or 1, so that the distribution 
problem is solvable only if Nx < M, and evidently equivalent to the choice of N;” balls (in arbitrary 
order) from the total number of M distinct balls. Comparing this formulation with the definition of the 


binomial coefficient,°” we immediately get 


_ _ M! 
M,= Cve) = (i —N@)INEN (2,122) 
From here, using the Stirling formula (again, in its simplest form (27)), we get 
Fermions: 
entropy S, =-(N,) In(v,)-(1-(W,)) nl -(,)), (2.123) 
where 
N®) 
(N,) =limy, 0 a (2.124) 


is exactly the average occupancy of the k" single-particle state in each system, which was discussed 
earlier in this section. Since for a Fermi system, ( N;) is always somewhere between 0 and 1, its entropy 
(123) is always positive. 


In the Bose case, where the Pauli principle is not valid, the number NV,” of particles on the k” 


energy level in each of the systems is an arbitrary (non-negative) integer. Let us consider Ny” particles 
and (M — 1) partitions (shown by vertical lines in Fig. 16) between M systems as (M — | + N°) 
mathematical objects ordered along one axis. Each specific location of the partitions evidently fixes all 
NE”. Hence M, may be calculated as the number of possible ways to distribute the (M — 1) 
indistinguishable partitions among these (V — 1 + N°) ordered objects, i.e. as the following binomial 
coefficient:° 


67 See, e.g., MA Eq. (2.2). 
68 See also MA Eq. (2.4). 
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= M -1+N®)! 
M,=%e1 c= So (2.125) 
(M -1)1N®! 
Applying the Stirling formula (27) again, we get the following result, 
S, =-(N,) In(v,)+(1+(N,)) in + (,)) (2.126) 


which again differs from the Fermi case (123) “only” by the signs in the second term, and is valid for 
any positive (V;). 


Expressions (123) and (126) are valid for an arbitrary (possibly non-equilibrium) case; they may 
be also used for an alternative derivation of the Fermi-Dirac (115) and Bose-Einstein (118) distributions, 
which are valid only in equilibrium. For that, we may use the method of Lagrange multipliers, requiring 
(just like it was done in Sec. 2) the total entropy of a system of N independent, similar particles, 


S= >) Sys (2.127) 
k 


considered as a function of state occupancies (N;), to attain its maximum, under the conditions of the 
fixed total number of particles N and total energy E: 


> (N,)=N=const, > (N,)e, = £ = const. (2.128) 
k k 


The completion of this calculation is left for the reader’s exercise. 


In the classical limit, when the average occupancies ( N;) of all states are small, the Fermi and 
Bose expressions for S; tend to the same limit 


S, =-(N,)In(N,), for (N,)<<1. (2.129) 


This expression, frequently referred to as the Boltzmann (or “classical”) entropy, might be also obtained, 
for arbitrary ( N;), directly from the functionally similar Eq. (29), by considering an ensemble of 
systems, each consisting of just one classical particle, so that E,, > & and W,, > (N,). Let me 
emphasize again that for indistinguishable particles, such identification is generally (i.e. at (Ni) ~ 1) 
illegitimate even if the particles do not interact explicitly. As we will see in the next chapter, 
indistinguishability may affect the statistical properties of identical particles even in the classical limit. 


2.9. Exercise problems 


2.1. A famous example of macroscopic irreversibility was suggested in 1907 by P. Ehrenfest. 
Two dogs share 2N >> | fleas. Each flea may jump onto another dog, and the rate I’ of such events (1.e. 
the probability of jumping per unit time) does not depend either on time or on the location of other fleas. 
Find the time evolution of the average number of fleas on a dog, and of the flea-related part of the total 
dogs’ entropy (at arbitrary initial conditions), and prove that the entropy can only grow. 


69 This is essentially a simpler (and funnier :-) version of the particle scattering model used by L. Boltzmann to 
prove his famous H-theorem (1872). Besides the historic significance of that theorem, the model used in it (see 
Sec. 6.2 below) is as cartoonish, and not more general. 
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2.2. Use the microcanonical distribution to calculate thermodynamic properties (including the 
entropy, all relevant thermodynamic potentials, and the heat capacity), of a two-level system in 
thermodynamic equilibrium with its environment, at temperature T that is comparable with the energy 
gap A. For each variable, sketch its temperature dependence, and find its asymptotic values (or trends) in 


the low-temperature and high-temperature limits. 
Hint: The two-level system is any quantum system with just two different stationary states, 


whose energies (say, Eo and £) are separated by a gap A= E; _ Ep. Its most popular (but by no means 
the only!) example is the spin- of a particle, e.g., an electron, in an external magnetic field.”° 


2.3. Solve the previous problem using the Gibbs distribution. Also, calculate the probabilities of 
the energy level occupation, and give physical interpretations of your results, in both temperature limits. 


2.4. Calculate low-field magnetic susceptibility y of a quantum spin-% particle with a 
gyromagnetic ratio y, in thermal equilibrium with an environment at temperature 7, neglecting its orbital 
motion. Compare the result with that for a classical spontaneous magnetic dipole m of a fixed 
magnitude m0, free to change its direction in space. 


Hint: The low-field magnetic susceptibility of a single particle is defined’! as 
alm.) 
OH 
where the z-axis is aligned with the direction of the external magnetic field # 


H-0? 


2.5. Calculate the low-field magnetic susceptibility of a particle with an arbitrary (either integer 
or semi-integer) spin s, neglecting its orbital motion. Compare the result with the solution of the 
previous problem. 


Hint: Quantum mechanics” tells us that the Cartesian component s, of the magnetic moment of 
such a particle, in the direction of the applied field, has (2s + 1) stationary values: 


m, = yhm,, with m, =—-s,—s+l,..,5s—-—Ls, 


where y is the gyromagnetic ratio of the particle, and / is Planck’s constant. 


2.6. Analyze the possibility of using a system of non-interacting spin-’ particles, placed into a 
strong, controllable external magnetic field, for refrigeration. 


2.7. The rudimentary “zipper” model of DNA replication is a 
chain of N links that may be either open or closed — see the figure on the 
right. Opening a link increases the system’s energy by A>0O;alink may 7 9 . , ya,“ N 
change its state (either open or closed) only if all links to the left of it are 


70 See, e.g., QM Secs. 4.6 and 5.1, for example, Eq. (4.167). 
71 This “atomic” (or “molecular”) susceptibility should be distinguished from the “volumic” susceptibility 7m = 
OMJOKH, where M is the magnetization, i.e. the magnetic moment of a unit volume of a system — see, e.g., EM 


Eq. (5.111). For a uniform medium with n = N/V non-interacting dipoles per unit volume, y= ny. 
72 See, e.g., QM Sec. 5.7, in particular Eq. (5.169). 
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open, while those on the right of it, are closed. Calculate the average number of open links at thermal 
equilibrium, and analyze its temperature dependence, especially for the case N>> 1. 


2.8. Use the microcanonical distribution to calculate the average entropy, energy, and pressure of 
a Classical particle of mass m, with no internal degrees of freedom, free to move in volume J, at 
temperature 7. 


Hint: Try to make a more accurate calculation than has been done in Sec. 2.2 for the system of V 
harmonic oscillators. For that, you will need to know the volume Vz of a d-dimensional hypersphere of 
the unit radius. To avoid being too cruel, I am giving it to you: 


Vi= t7/r{4 +1), 
2 


2.9. Solve the previous problem starting from the Gibbs distribution. 


where I'(é) is the gamma function.73 


2.10. Calculate the average energy, entropy, free energy, and the equation of state of a classical 
2D particle (without internal degrees of freedom), free to move within area A, at temperature 7, starting 
from: 


(i) the microcanonical distribution, and 
(ii) the Gibbs distribution. 


Hint: For the equation of state, make the appropriate modification of the notion of pressure. 


2.11. A quantum particle of mass m is confined to free motion along a 1D segment of length a. 
Using any approach you like, calculate the average force the particle exerts on the “walls” (ends) of such 
“1D potential well” in thermal equilibrium, and analyze its temperature dependence, focusing on the 
low-temperature and high-temperature limits. 


Hint: You may consider the series @(é ) = > exp|- es n?} a known function of &. 74 


n=l 


2.12." Rotational properties of diatomic molecules (such as N2, CO, etc.) may be reasonably well 
described by the so-called dumbbell model: two point particles, of masses m; and m2, with a fixed 
distance d between them. Ignoring the translational motion of the molecule as the whole, use this model 
to calculate its heat capacity, and spell out the result in the limits of low and high temperatures. Discuss 
whether your solution is valid for the so-called homonuclear molecules, consisting of two similar atoms, 
such as H, Oo, No, etc. 


2.13. Calculate the heat capacity of a heteronuclear diatomic molecule, using the simple model 
described in the previous problem, but now assuming that the rotation is confined to one plane.’> 


73 For its definition and main properties, see, e.g., MA Eqs. (6.6)-(6.9). 

74 It may be reduced to the so-called elliptic theta-function 0,(z, 7) for a particular case z = 0 — see, e.g., Sec. 16.27 
in the Abramowitz-Stegun handbook cited in MA Sec. 16(ii). However, you do not need that (or any other) 
handbook to solve this problem. 

75 This is a reasonable model of the constraints imposed on small atomic groups (e.g., ligands) by their atomic 
environment inside some large molecules. 
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2.14. A classical, rigid, strongly elongated body (such as a thin needle), is free to rotate about its 
center of mass, and is in thermal equilibrium with its environment. Are the angular velocity vector @ 
and the angular momentum vector L, on average, directed along the elongation axis of the body, or 
normal to it? 


2.15. Two similar classical electric dipoles, of a fixed magnitude d, are separated by a fixed 
distance r. Assuming that each dipole moment d may take any spatial direction and that the system is in 
thermal equilibrium, write the general expressions for its statistical sum Z, average interaction energy E, 


heat capacity C, and entropy S, and calculate them explicitly in the high-temperature limit. 


2.16. A classical 1D particle of mass m, residing in the potential well 


U(x)= ax a 


with y>0, 


is in thermal equilibrium with its environment, at temperature 7. Calculate the average values of its 
potential energy U and the full energy £, using two approaches: 


(1) directly from the Gibbs distribution, and 
(ii) using the virial theorem of classical mechanics.76 


2.17. For a thermally-equilibrium ensemble of slightly anharmonic classical 1D oscillators, with 
mass m and potential energy 


K 
U(q)=—x’? +ax°, 
2 
with a small coefficient @, calculate (x) in the first approximation in low temperature 7. 


2.18 .” A small conductor (in this context, usually called the 
single-electron island) is placed between two conducting 
electrodes, with voltage V applied between them. The gap between 
one of the electrodes and the island is so narrow that electrons may 
tunnel quantum-mechanically through this gap (the “weak tunnel 
junction”) — see the figure on the right. Calculate the average 
charge of the island as a function of V at temperature T. 


V "island" 


O=-ne 


n tunnel 
7 junction 


Co 


Hint: The quantum-mechanical tunneling of an electron 
through a weak junction’? between two macroscopic conductors and their subsequent energy relaxation, 
may be considered as a single inelastic (energy-dissipating) event, so that the only energy relevant for 
the thermal equilibrium of the system is its electrostatic potential energy. 


2.19. An LC circuit (see the figure on the right) is in thermodynamic 
equilibrium with its environment. Calculate the r.m.s. fluctuation 6V = iy i V C L 


76 See, e.g., CM Problem 1.12. 

77 Tn this particular context, the adjective “weak” denotes a junction with the tunneling transparency so low that 
the tunneling electron’s wavefunction loses its quantum-mechanical coherence before the electron has a chance to 
tunnel back. In a typical junction of a macroscopic area this condition is fulfilled if its effective resistance is much 


higher than the quantum unit of resistance (see, e.g., QM Sec. 3.2), Ra= mh/2e” = 6.5 kQ. 
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of the voltage across it, for an arbitrary ratio 7/h@, where @= (LC) is the resonance frequency of this 


“tank circuit”. 


2.20. Derive Eq. (92) from simplistic arguments, representing the blackbody radiation as an ideal 
gas of photons treated as classical ultra-relativistic particles. What do similar arguments give for an 
ideal gas of classical but non-relativistic particles? 


2.21. Calculate the enthalpy, the entropy, and the Gibbs energy of blackbody electromagnetic 
radiation with temperature 7 inside volume V, and then use these results to find the law of temperature 
and pressure drop at an adiabatic expansion. 


2.22. As was mentioned in Sec. 6(1), the relation between the temperatures 7 of the visible 
Sun’s surface and that (7,) of the Earth’s surface follows from the balance of the thermal radiation they 
emit. Prove that the experimentally observed relation indeed follows, with good precision, from a simple 
model in which the surfaces radiate as perfect black bodies with constant temperatures. 


Hint: You may pick up the experimental values you need from any (reliable :-) source. 
2.23. If a surface is not perfectly radiation-absorbing (“black”), the electromagnetic power of its 
thermal radiation differs from the Planck radiation law by a frequency-dependent factor ¢ < 1, called the 


emissivity. Prove that such surface reflects the (1 — ¢) fraction of the incident radiation. 


2.24. If two black surfaces, facing each other, have different 


temperatures (see the figure on the right), then according to the Stefan P 
. . . . . net 
radiation law (89), there is a net flow of thermal radiation, from a warmer 7, ——S Le, 
surface to the colder one: 
ty t 4 4 
Seo, —T, ). 
= o(T'-T;) 


For many applications, notably including most low-temperature experiments, this flow is detrimental. 
One way to suppress it is to reduce the emissivity ¢ (for its definition, see the previous problem) of both 
surfaces — say by covering them with shiny metallic films. An alternative way toward the same goal is to 
place, between the surfaces, a thin layer (usually called the thermal shield), with a low emissivity of 
both surfaces — see the dashed line in Fig. above. Assuming that the emissivity is the same in both cases, 
find out which way is more efficient. 


2.25. Two parallel, well-conducting plates of area A are separated by a free-space gap of a 
constant thickness t << A’. Calculate the energy of the thermally-induced electromagnetic field inside 
the gap at thermal equilibrium with temperature 7 in the range 


hc 


Does the field push the plates apart? 


2.26. Use the Debye theory to estimate the specific heat of aluminum at room temperature (say, 
300 K), and express the result in the following popular units: 
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(i) eV/K per atom, 
(11) J/K per mole, and 
(111) J/K per gram. 


Compare the last number with the experimental value (from a reliable book or online source). 


2.27. Low-temperature specific heat of some solids has a considerable contribution from thermal 
excitation of spin waves, whose dispersion law scales as @ x k° at w > 0.78 Neglecting anisotropy, 
calculate the temperature dependence of this contribution to Cy at low temperatures, and discuss 
conditions of its experimental observation. 


Hint: Just as the photons and phonons discussed in section 2.6, the quantum excitations of spin 
waves (called magnons) may be considered as non-interacting bosonic quasiparticles with zero chemical 
potential, whose statistics obeys Eq. (2.72). 


2.28. Derive a general expression for the specific heat of a very m m m 
long, straight chain of similar particles of mass m, confined to move only in “W@WWOWWOWW- 
the direction of the chain, and elastically interacting with effective spring “~ x hs a 
constants « — see the figure on the right. Spell out the result in the limits of very low and very high 
temperatures. 


too 2 2 
Hint: You may like to use the following integral:7° ( 6 a =7 
,»sinhe& 6 


2.29. Calculate the r.m.s. thermal fluctuation of the middle point of a uniform guitar string of 
length J, stretched by force J, at temperature 7. Evaluate your result for /= 0.7 m, 7 = 10°N, and room 


temperature. 
. L334 =e 1 : 
Hint: You may like to use the following series: 1+ | +—+...= >, == ig 
3 5 m=0 (2m + 1) 8 


2.30. Use the general Eq. (123) to re-derive the Fermi-Dirac distribution (115) for a system in 
equilibrium. 


2.31. Each of two identical particles, not interacting directly, may be in any of two quantum 
states, with single-particle energies ¢ equal to 0 and A. Write down the statistical sum Z of the system, 
and use it to calculate its average total energy E at temperature 7, for the cases when the particles are: 


(i) distinguishable (say, by their positions); 
(ii) indistinguishable fermions; 
(iii) indistinguishable bosons. 


Analyze and interpret the temperature dependence of E for each case, assuming that A > 0. 


2.32. Calculate the chemical potential of a system of N >> 1 independent fermions, kept at a 
fixed temperature 7, if each particle has two non-degenerate energy levels separated by gap A. 


78 Note that the same dispersion law is typical for bending waves in thin elastic rods — see, e.g., CM Sec. 7.8. 
79 It may be reduced, via integration by parts, to the table integral MA Eq. (6.8d) with n= 1. 
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Chapter 3. Ideal and Not-So-Ideal Gases 


In this chapter, the general principles of thermodynamics and statistics, discussed in the previous two 
chapters, are applied to examine the basic physical properties of gases, i.e. collections of identical 
particles (for example, atoms or molecules) that are free to move inside a certain volume, either not 
interacting or weakly interacting with each other. We will see that due to the quantum statistics, 
properties of even the simplest, so-called ideal gases, with negligible direct interactions between 
particles, may be highly nontrivial. 


3.1. Ideal classical gas 


Direct interactions of typical atoms and molecules are well localized, i.e. rapidly decreasing with 
distance r between them and becoming negligible at a certain distance 7. In a gas of N particles inside 
volume V, the average distance rye between the particles is (VIN)"’ 3 Asa result, if the gas density n = 
N/V = (rave) is much lower than ro”, i.e. if nro’ << 1, the chance for its particles to approach each other 
and interact is rather small. The model in which such direct interactions are completely ignored is called 
the ideal gas. 


Let us start with a classical ideal gas, which may be defined as the ideal gas in whose behavior 
the quantum effects are also negligible. As was discussed in Sec. 2.8, the condition of that is to have the 
average occupancy of each quantum state low: 


(N,)<<1. (3.1) 


It may seem that we have already found all properties of such a system, in particular the equilibrium 
occupancy of its states — see Eq. (2.111): 


(N;) = const x exp|- a : (3.2) 


In some sense this is true, but we still need, first, to see what exactly Eq. (2) means for the gas, a system 
with an essentially continuous energy spectrum, and, second, to show that, rather surprisingly, the 
particles’ indistinguishability affects some properties of even classical gases. 


The first of these tasks is evidently easiest for gas out of any external fields, and with no internal 
degrees of freedom.! In this case, gis just the kinetic energy of the particle, which is an isotropic and 
parabolic function of p: 

2 2 2 
| eee a aaa 


é,= ; 3.3 
‘2m 2m an 


Now we have to use two facts from other fields of physics, hopefully well known to the reader. First, in 
quantum mechanics, the linear momentum p is associated with the wavevector k of the de Broglie wave, 


' In more realistic cases when particles do have internal degrees of freedom, but they are all in a certain (say, 
ground) quantum state, Eq. (3) is valid as well, with & referred to the internal ground-state energy. The effect of 
thermal excitation of the internal degrees of freedom will be briefly discussed at the end of this section. 
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p = Ak. Second, the eigenvalues of k for any waves (including the de Broglie waves) in free space are 
uniformly distributed in the momentum space, with a constant density of states, given by Eq. (2.82): 


dN. AN 
gos 87 ig, Moms 87 (3.4) 
d*k — (2n) dp (2h) 


states __ 


where g is the degeneracy of particle’s internal states (for example, for all spin-’2 particles, the spin 
degeneracy g = 2s + 1 = 2). Even regardless of the exact proportionality coefficient between dNetates and 
d’p, the very fact that this coefficient does not depend on p means that the probability dW to find the 
particle in a small region d’p = dp\dp2dp3 of the momentum space is proportional to the right-hand side 
of Eq. (2), with & given by Eq. (3): 


mT 2mT 


2 2 2 2 
Pi + Py+ Pp M I 
- \a ‘p=€ es] iat fama. G5)  geneution 


This is the famous Maxwell distribution.2 The normalization constant C may be readily found 
from the last form of Eq. (5), by requiring the integral of dW over all the momentum space to equal 1. 
Indeed, the integral is evidently a product of three similar 1D integrals over each Cartesian component p; 
of the momentum (7 = 1, 2, 3), which may be readily reduced to the well-known dimensionless Gaussian 
integral,? so that we get 


C= Joo = ye, = omy fer Pas _ (2amT)?”. (3.6) 


mT 


As a sanity check, let us use the Maxwell distribution to calculate the average energy 
corresponding to each half-degree of freedom: 


2 
2 2 +00 2 2 +00 2 
P; P; we [Pi Pj 1/3 DP; 
<!\- [aw =|C dp ; |x| C = dp j 
(a) Ge | is os fi r | fe} zi) P| (3.7) 


fete? ae. 


1/2 
a 


The last, dimensionless integral equals V7/2,‘ so that, finally, 


pi\_/mvj\_T 
ee . 


? This formula had been suggested by J. C. Maxwell as early as 1860, i.e. well before the Boltzmann and Gibbs 
distributions were developed. Note also that the term “Maxwell distribution” is often associated with the 
distribution of the particle momentum (or velocity) magnitude, 


2 
| = 4n2Cm*y? ex 


2 


dW = 4nCp* ex o ad 


mT 2T 


which immediately follows from the first form of Eq. (5), combined with the expression d°p = 4zp*dp due to the 
spherical symmetry of the distribution in the momentum/Vvelocity space. 

3 See, e.g., MA Eq. (6.9b). 

4 See, e.g., MA Eq. (6.9c). 


fa with 0< p,v <0, 
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This result is (fortunately :-) in agreement with the equipartition theorem (2.48). It also means 
that the r.m.s. velocity of each particle is 


alr” = 4) =3y\" = cr 3.9) 


For a typical gas (say, for No, the air’s main component), with m ~ 28m, = 4.7x107° kg, this velocity, at 
room temperature (7 = kpTx ~ kpx300 K 4.1x107! J) is about 500 m/s, comparable with the sound 
velocity in the same gas — and with the muzzle velocity of a typical handgun bullet. Still, it is 
measurable using even the simple table-top equipment (say, a set of two concentric, rapidly rotating 
cylinders with a thin slit collimating an atomic beam emitted at the axis) that was available in the end of 
the 19" century. Experiments using such equipment gave convincing early confirmations of the 
Maxwell distribution. 


This is all very simple (isn’t it?), but actually the thermodynamic properties of a classical gas, 
especially its entropy, are more intricate. To show that, let us apply the Gibbs distribution to a gas 
portion consisting of N particles, rather than just one of them. If the particles are exactly similar, the 
eigenenergy spectrum {&} of each of them is also exactly the same, and each value E,, of the total 
energy is just the sum of particular energies « of the particles, where A(/), with / = 1, 2, ... N, is the 
number of the energy level on which the /" particle resides. Moreover, since the gas is classical, ( Nx) 
<< 1, the probability of having two or more particles in any state may be ignored. As a result, we can 
use Eq. (2.59) to write 


L= Yew|-4 


where the summation has to be carried over all possible states of each particle. Since the summation 
over each set {A(/)} concerns only one of the operands of the product of exponents under the sum, it is 
tempting to complete the calculation as follows: 


creel 2] Boole (ze lf 


k(1) k(2) K(N) 


et Yesp}- $ E40} poe EM “w (3.10) 


k(1) K(1) k(2) k(N) 


where the final summation is over all states of one particle. This formula is indeed valid for 
distinguishable particles.> However, if the particles are indistinguishable (again, meaning that they are 
internally identical and free to move within the same spatial region), Eq. (11) has to be modified by 
what is called the correct Boltzmann counting: 


(3.12) 


that considers all quantum states different only by particle permutations, as the same state. 


5 Since, by our initial assumption, each particle belongs to the same portion of gas, i.e. cannot be distinguished 
from others by its spatial position, this requires some internal “pencil mark” for each particle — for example, a 
specific structure or a specific quantum state of its internal degrees of freedom. 
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This expression is valid for any set {&} of eigenenergies. Now let us use it for the translational 
3D motion of free particles, taking into account that the fundamental relation (4) implies the following 
rule for the replacement of a sum over quantum states of such motion with an integral:® 


Lil) Jl-)aN ass = gv [ate = gv [(Ja’p. (3.13) 


(27)  (Qrhy 


In application to Eq. (12), this rule yields 


3 N 
Di: eve oe P; 
Fae) ae ae 3.14 
M | fo zi Ps oe 


—00 


1/2 


The integral in the square brackets is the same one as in Eq. (6), i.e. is equal to (2mmT)"”, so that finally 


1( go one ry?) 
Z =—| = _QamT)? | =—| gv| (3.15) 
N!\ (27h) Mi? 27h 


Now, assuming that V >> 1,’ and applying the Stirling formula, we can calculate the gas’ free energy: 


1 V 
F =T\In—=-NTIn—+MNf(1), 3.16 
ne n+ Nf(T) (3.16a) 
with 
T 3/2 
m 
T)=-T<In +1}. 3.16b 
S(T) (er) | ( ) 


The first of these relations exactly coincides with Eq. (1.45), which was derived in Sec. 1.4 from 
the equation of state PV = NT, using thermodynamic identities. At that stage, this equation of state was 
just postulated, but now we can derive it by calculating the pressure from the second of Eqs. (1.35), and 


Eq. (16a): 
P= [ ) — . (3.17) 
OV), V 


So, the equation of state of the ideal classical gas, with density n = N/V, is indeed given by Eq. (1.44): 


paw = 


nT. (3.18) 
Hence we may use Eqs. (1.46)-(1.51), derived from this equation of state, to calculate all other 
thermodynamic variables of the gas. For example, using Eq. (1.47) with (7) given by Eq. (16b), for the 
internal energy and the specific heat of the gas we immediately get 


6 As a reminder, we have already used this rule (twice) in Sec. 2.6, with particular values of g. 

7 For the opposite limit when NV = g = 1, Eq. (15) yields the results obtained, by two alternative methods, in the 
solutions of Problems 2.8 and 2.9. Indeed, for N = 1, the “correct Boltzmann counting” factor N! equals 1, so that 
the particle distinguishability effects vanish — naturally. 
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_ vO 3 AOR (= = 
B=N 40) a a = SAT, aera oa (3.19) 


in full agreement with Eq. (8) and hence with the equipartition theorem. 


Much less trivial is the result for entropy, which may be obtained by combining Eqs. (1.46) and 


(16a): 
s= {=) = sin 20 | (3.20) 
OT }, N aT 


This formula,’ in particular, provides the means to resolve the following gas mixing paradox (sometimes 
called the “Gibbs paradox”). Consider two volumes, V; and V2, separated by a partition, each filled with 
the same gas, with the same density n, at the same temperature 7, and hence with the same pressure P. 
Now let us remove the partition and let the gas portions mix; would the total entropy change? According 
to Eq. (20), it would not, because the ratio V/N = n, and hence the expression in the square brackets is 
the same in the initial and the final state, so that the entropy is additive, as any extensive variable should 
be. This makes full sense if the gas particles in both parts of the volume are truly identical, i.e. the 
partition’s removal does not change our information about the system. However, let us assume that all 
particles are distinguishable; then the entropy should clearly increase because the mixing would 
decrease our information about the system, i.e. increase its disorder. A quantitative description of this 
effect may be obtained using Eq. (11). Repeating for Z,i the calculations made above for Z, we readily 
get a different formula for entropy: 


oe — Aix (T) ee 2 mT “ 
Sq = {Inv aT I Faw (T) = ri (2) } (3.21) 


Please notice that in contrast to the S given by Eq. (20), this entropy includes the term InV 
instead of In(V/N), so that Suir is not proportional to N (at fixed temperature 7 and density N/V). While 
for distinguishable particles this fact does not present any conceptual problem, for indistinguishable 
particles it would mean that entropy was not an extensive variable, i.e. would contradict the basic 
assumptions of thermodynamics. This fact emphasizes again the necessity of the correct Boltzmann 
counting in the latter case. 


Using Eq. (21), we can calculate the change of entropy due to mixing two gas portions, with N; 
and N> distinguishable particles, at a fixed temperature T (and hence at unchanged function /qist): 


V+; V+V; 


+ N,In——+>0. (3.22) 

f V; 

Note that for a particular case, V; = V2 = V/2, Eq. (22) reduces to the simple result, ASgist = (Ni + N2) In2, 
which may be readily understood in terms of the information theory. Indeed, allowing each particle of 
the total number N = N; + N2 to spread to a twice larger volume, we lose one bit of information per 
particle, i.e. AJ = (N; + N2) bits for the whole system. Let me leave it for the reader to show that Eq. (22) 
is also valid if particles in each sub-volume are indistinguishable from each other, but different from 


AS ig =(N, +N, )In(V, +V,)-(N, InV, +N, InV,) = N, In 


8 The result represented by Eq. (20), with the function f given by Eq. (16b), was obtained independently by O. 
Sackur and H. Tetrode as early as in 1911, i.e. well before the final formulation of quantum mechanics in the late 
1920s. 
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those in another sub-volume, i.e. for mixing of two different gases.° However, it is certainly not 
applicable to the system where all particles are identical, stressing again that the correct Boltzmann 
counting (12) does indeed affect the gas entropy, even though it may be not as consequential as the 
Maxwell distribution (5), the equation of state (18), and the average energy (19). 


In this context, one may wonder whether the change (22) (called the mixing entropy) is 
experimentally observable. The answer is yes. For example, after free mixing of two different gases, and 
hence boosting their total entropy by ASugis, one can use a thin movable membrane that is 
semipermeable, i.e. whose pores are penetrable for particles of one type only, to separate them again, 
thus reducing the entropy back to the initial value, and measure either the necessary mechanical work 
AW = TAS«is, or the corresponding heat discharge into the heat bath. Practically, measurements of this 
type are easier in weak solutions'® — systems with a small concentration c << 1 of particles of one sort 
(solute) within much more abundant particles of another sort (so/vent). The mixing entropy also affects 
the thermodynamics of chemical reactions in gases and liquids.!! Note that besides purely thermal- 
mechanical measurements, the mixing entropy in some conducting solutions (e/ectrolytes) is also 
measurable by a purely electrical method, called cyclic voltammetry, in which a low-frequency ac 
voltage, applied between two solid-state electrodes embedded in the solution, is used to periodically 
separate different ions, and then mix them again. !? 


Now let us briefly discuss two generalizations of our results for ideal classical gases. First, let us 
consider such gas in an external field of potential forces. It may be described by replacing Eq. (3) with 


2 
e, =2*+U(r,), (3.23) 
2m 

where rj is the position of the k" particular particle, and U(r) is the potential energy of the particle. If 
the potential U(r) is changing in space sufficiently slowly,!3 Eq. (4) is still applicable, but only to small 
volumes, V > dV = d’r whose linear size is much smaller than the spatial scale of substantial variations 
of the function U(r). Hence, instead of Eq. (5), we may only write the probability dW of finding the 
particle in a small volume d’ra’p of the 6-dimensional phase space: 


° By the way, if an ideal classical gas consists of particles of several different sorts, its full pressure is a sum of 
independent partial pressures exerted by each component — the so-called Dalton law. While this fact was an 
important experimental discovery in the early 1800s, for statistical physics this is just a straightforward corollary 
of Eq. (18), because in an ideal gas, the component particles do not interact. 

10 Interestingly, the statistical mechanics of weak solutions is very similar to that of ideal gases, with Eq. (18) 
recast into the following formula (derived in 1885 by J. van ’t Hoff), PV = cNT, for the partial pressure of the 
solute. One of its corollaries is that the net force (called the osmotic pressure) exerted on a semipermeable 
membrane is proportional to the difference of the solute concentrations it is supporting. 

'l Unfortunately, I do not have time for even a brief introduction into this important field, and have to refer the 
interested reader to specialized textbooks — for example, P. A. Rock, Chemical Thermodynamics, University 
Science Books, 1983; or P. Atkins, Physical Chemistry, 5" ed., Freeman, 1994; or G. M. Barrow, Physical 
Chemistry, 6" ed., McGraw-Hill, 1996. 

12 See, e.g., either Chapter 6 in A. Bard and L. Falkner, Electrochemical Methods, 2" ed., Wiley, 2000 (which is a 
good introduction to electrochemistry as the whole); or Sec. H.8.3.1 in F. Scholz (ed.), Electroanalytical Methods, 
2” ed., Springer, 2010. 

'3 Quantitatively, the effective distance of substantial variations of the potential, 7/|VU(r)|, has to be much larger 
than the mean free path I of the gas particles, i.e. the average distance a particle passes its successive collisions 
with its counterparts. (For more on this notion, see Chapter 6 below.) 
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2 
dW = w(r,p)d*rd’ p, w(r, Pp) = const x ex - ss vo ; (3.24) 
2mT T 


Hence, the Maxwell distribution of particle velocities is still valid at each point r, so that the equation of 
state (18) is also valid locally. A new issue here is the spatial distribution of the total density, 


n(r) = N } w(r,p)d’p, (3.25) 


of all gas particles, regardless of their momentum/Velocity. For this variable, Eq. (24) yields!4 


n(r) = n(0) exp|- “0, ; (3.26) 


where the potential energy at the origin (r = 0) is used as the reference of U, and the local gas pressure 
may be still calculated from the local form of Eq. (18): 


P(r) =n(r)T = P(0) exp|- vo (3.27) 


A simple example of numerous applications of Eq. (27) is an approximate description of the 
Earth’s atmosphere. At all heights h << Rg ~ 6x10° m above the Earth’s surface (say, above the sea 
level), we may describe the Earth gravity effect by the potential U = mgh, and Eq. (27) yields the so- 
called barometric formula 


P(h) = P(0) ex + with h, = sees : (3.28) 
hy mg mg 


For the same N>, the main component of the atmosphere, at 7x = 300 K, 49 ~ 7 km. This gives the 
correct order of magnitude of the atmosphere’s thickness, though the exact law of the pressure change 
differs somewhat from Eq. (28), because the flow of radiation from Sun and Earth cause a relatively 
small deviation of the atmospheric air from the thermal equilibrium: a drop of its temperature T with 
height, with the so-called /apse rate of about 2% (~6.5 K) per km. 


The second generalization I need to discuss is to particles with internal degrees of freedom. Now 
ignoring the potential energy U(r), we may describe them by replacing Eq. (3) with 


2 


é,=/ +6, (3.29) 


2m 


where &’ describes the internal energy spectrum of the k"" particle. If the particles are similar, we may 
repeat all the above calculations, and see that all their results (including the Maxwell distribution, and 
the equation of state) are still valid, with the only exception of Eq. (16), which now becomes 


f= rol 2 ) } 1+ Hl Sea “I , (3.30) 


14 In some textbooks, Eq. (26) is also called the Boltzmann distribution, though it certainly should be 
distinguished from Eq. (2.111). 
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As we already know from Eqs. (1.50)-(1.51), this change may affect both specific heats of the 
ideal gas — though not their difference, cy — cp = 1. They may be readily calculated for usual atoms and 
molecules, at not very high temperatures (say the room temperature of ~25 meV), because in these 
conditions, €&’ >> T for most their internal degrees of freedom, including the electronic and vibrational 
ones. (The typical energy of the lowest electronic excitations is of the order of a few eV, and that of the 
lowest vibrational excitations is only an order of magnitude lower.) As a result, these degrees of 
freedom are “frozen out”: they are in their ground states, so that their contributions exp {-&%’/7} to the 
sum in Eq. (30), and hence to the heat capacity, are negligible. In monoatomic gases, this is true for all 
degrees of freedom besides those of the translational motion, already taken into account by the first term 
in Eq. (30), i.e. by Eq. (16b), so that their specific heat is typically well described by Eq. (19). 


The most important exception is the rotational degrees of freedom of diatomic and polyatomic 
molecules. As quantum mechanics shows,!> the excitation energy of these degrees of freedom scales as 
h’/2I, where J is the molecule’s relevant moment of inertia. In the most important molecules, this energy 
is rather low (e.g. for No, it is close to 0.25 meV, i.e. ~1% of the room temperature), so that at usual 
conditions they are well excited and, moreover, behave virtually as classical degrees of freedom, each 
giving a quadratic contribution to the molecule’s energy, and hence obeying the equipartition theorem, 
i.e. giving an extra contribution of 7/2 to the energy, i.e. 2 to the specific heat.!© In polyatomic 
molecules, there are three such classical degrees of freedom (corresponding to their rotations about three 
principal axes!”), but in diatomic molecules, only two.!8 Hence, these contributions may be described by 
the following generalization of Eq. (19): 


3/2, for monoatomic gases, 
cy =45/2, for gases of diatomic molecules, (3.31) 
3, for gases of polyatomic molecules. 


Please keep in mind, however, that as the above discussion shows, this simple result is invalid at 
very low and very high temperatures; its most notable violation is that the thermal activation of 
vibrational degrees of freedom for many important molecules at temperatures of a few thousand K. 


3.2. Calculating w 


Now let us discuss properties of ideal gases of free, indistinguishable particles in more detail, 
paying special attention to the chemical potential 4— which, for some readers, may still be a somewhat 
mysterious aspect of the Fermi and Bose distributions. Note again that particle indistinguishability 
requires the absence of thermal excitations of their internal degrees of freedom, so that in the balance of 
this chapter such excitations will be ignored, and the particle’s energy & will be associated with its 
“external” energy alone: for a free particle in an ideal gas, with its kinetic energy (3). 


!5 See, e.g., either the model solution of Problem 2.12 (and references therein), or QM Secs. 3.6 and 5.6. 

16 This result may be readily obtained again from the last term of Eq. (30) by treating it exactly like the first one 
was and then applying the general Eq. (1.50). 

'7 See, e.g., CM Sec. 4.1. 

'8 This conclusion of the quantum theory may be interpreted as the indistinguishability of the rotations about the 
molecule’s symmetry axis. 
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Let us start from the classical gas, and recall the conclusion of thermodynamics that yz is just the 
Gibbs potential per unit particle — see Eq. (1.56). Hence we can calculate = G/N from Eqs. (1.49) and 


(16b). The result, 
N (2ah?)” 
pal ‘ (3.32a) 


gV\ mT 
N (2an?) 
exp A = (3.32b) 


= Pin—+ f(0)+T =TInl 


which may be rewritten as 


T{) gV\ mT 


gives us some information about yz not only for a classical gas but for quantum (Fermi and Bose) gases 
as well. Indeed, we already know that for indistinguishable particles, the Boltzmann distribution (2.111) 
is valid only if ( Nz ) << 1. Comparing this condition with the quantum statistics (2.115) and (2.118), we 
see again that the condition of the gas behaving classically may be expressed as 


ep “41 ae (3.33) 


for all &. Since the lowest value of & given by Eq. (3) is zero, Eq. (33) may be satisfied only if 
exp {/T} << 1. This means that the chemical potential of a classical gas has to be not just negative, but 
also “strongly negative” in the sense 

—L£>>T, (3.34a) 


According to Eq. (32), this important condition may be represented as 


i et (3.34b) 
with 7o defined as 


(3.35) 
where aye is the average distance between the gas particles: 
l V 1/3 
Nive = “iB -(<| (3.36) 


In this form, the condition (34) is very transparent physically: disregarding the factor g”* (which 
is typically of the order of 1), it means that the average thermal energy of a particle, which is always of 
the order of 7, has to be much larger than the energy of quantization of particle’s motion at the length 
Taye. An alternative form of the same condition is!? 


/3 


Nye >> &Y,, +Wwhere r, = (3.37) 


Cc 


(mT)! : 


For a typical gas (say, No, with m = 14m, ~ 2.3x10°° kg) at the standard room temperature (T = 
kgx300K ~ 4.1x107' J), the correlation length r; is close to 107! m, i.e. is significantly smaller than the 


19 In quantum mechanics, the parameter r, so defined is frequently called the correlation length — see, e.g., QM 
Sec. 7.2 and in particular Eq. (7.37). 
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physical size a ~ 3x10 m of the molecule. This estimate shows that at room temperature, as soon as 
any practical gas is rare enough to be ideal (Fave >> a), it is classical, i.e. the only way to observe 
quantum effects in the translational motion of molecules is very deep refrigeration. According to Eq. 
(37), for the same nitrogen molecule, taking rave ~ 10°a ~ 10° m (to ensure that direct interaction effects 
are negligible), the temperature should be well below 1 mK. 


In order to analyze quantitatively what happens with gases when T is reduced to such low values, 
we need to calculate wv for an arbitrary ideal gas of indistinguishable particles. Let us use the lucky fact 
that the Fermi-Dirac and the Bose-Einstein statistics may be represented with one formula: 


(N(e)) = ol 


eT aa a (3.38) 


where (and everywhere in the balance of this section) the top sign stands for fermions and the lower one 
for bosons, to discuss fermionic and bosonic ideal gases in one shot. 


If we deal with a member of the grand canonical ensemble (Fig. 2.13), in which not only T but 
also wis externally fixed, we may use Eq. (38) to calculate the average number N of particles in volume 
V. If the volume is so large that NV >> 1, we may use the general state counting rule (13) to get 


ay 3,__3V d°p gV_—__ Anp’dp 
bay! (Ne)dk= 7 Ey Teal a1 Gayl eorwiT 4, 29) 
In most practical cases, however, the number N of gas particles is fixed by particle confinement (i.e. the 
gas portion under study is a member of a canonical ensemble — see Fig. 2.6), and hence yw rather than NV 
should be calculated. Let us use the trick already mentioned in Sec. 2.8: if N is very large, the relative 
fluctuation of the particle number, at fixed yu, is negligibly small (SN/N ~ 1/VN << 1), and the relation 
between the average values of N and zz should not depend on which of these variables is exactly fixed. 
Hence, Eq. (39), with having the sense of the average chemical potential, should be valid even if N is 
exactly fixed, so that the small fluctuations of N are replaced with (equally small) fluctuations of yw. 
Physically, in this case the role of the s+ fixing environment for any sub-portion of the gas is played by 
the rest of it, and Eq. (39) expresses the condition of self-consistency of such chemical equilibrium. 


So, at N >> 1, Eq. (39) may be used for calculating the average w as a function of two 
independent parameters: N (i.e. the gas density n = N/V) and temperature 7. For carrying out this 
calculation, it is convenient to convert the right-hand side of Eq. (39) to an integral over the particle’s 
energy &p) = p’/2m, so that p = (2me)'”, and dp = (m/28)'""de, getting 


eVm>"” 00 ede 
= J227h? 5 ge MIT qe 
This key result may be represented in two other, more convenient forms. First, Eq. (40), derived for our 


current (3D, isotropic and parabolic-dispersion) approximation (3), is just a particular case of the 
following self-evident state-counting relation 


(3.40) 


N= { g(e)(N(e))de, (3.41) 


where 
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2(&) = AN oares [dE (3.42) 


states 


is the temperature-independent density of all quantum states of a particle — regardless of whether they 
are occupied or not. Indeed, according to the general Eq. (4), for our simple model (3), 


dN, d eV An , gVm>* ie 
EJ= é)= SHAS = ———_ & s 3.43 
e( ) g3(é) ys ate ny 3 ) VJ27r72n? ( ) 


so that we return to Eq. (39). 
On the other hand, for some calculations, it is convenient to introduce the following 
dimensionless energy variable: & = é/T, to express Eq. (40) via a dimensionless integral: 
ee eV(mT)*”” ro) Ege 
Vln ere at 
As a sanity check, in the classical limit (34), the exponent in the denominator of the fraction under the 
integral is much larger than 1, and Eq. (44) reduces to 


(3.44) 


*j 2 eV(mT)*”” 00 Erde a eV(mT)*”” 
J227h3 , eo HIT J207h3 
By the definition of the gamma function I’(é),”° the last integral is just [(3/2) = m'?/2, and we get 
> g 
Lu J2n*h? 2 T, 3/2 
exp) —- = N—___,,, = =| 22 ; 
T gVimT)” Va 


which is exactly the same result as given by Eq. (32), obtained earlier in a rather different way — from 
the Boltzmann distribution and thermodynamic identities. 


exp A fete fag, at -w>>T. (3.45) 
(3.46) 


Unfortunately, in the general case of arbitrary yw, the integral in Eq. (44) cannot be worked out 
analytically.2! The best we can do is to use 7, defined by Eq. (35), to rewrite Eq. (44) in the following 
convenient, fully dimensionless form: 

E 


1% el2ge 2/3 
, 3.47 
a Lala aan 


and then use this relation to calculate the ratios T/T) and wW/T> = (wW/T)x(7/To), as functions of 4/T 
numerically. After that, we may plot the results versus each other, now considering the first ratio as the 
argument. Figure 1 below shows the resulting plots, for both particle types. They show that at high 
temperatures, T >> 7, the chemical potential is negative and approaches the classical behavior given by 
Eq. (46) for both fermions and bosons — just as we could expect. However, at temperatures 7’ ~ 7p the 
type of statistics becomes crucial. For fermions, the reduction of temperature leads to w changing its 


20 See, e.g., MA Eq. (6.7a). 

21 For the reader’s reference only: for the upper sign, the integral in Eq. (40) is a particular form (for s = 4) of a 
special function called the complete Fermi-Dirac integral F,,, while for the lower sign, it is a particular case (for s 
= 3/2) of another special function called the polylogarithm Li,. (In what follows, I will not use these notations.) 
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sign from negative to positive, and then approaching a constant positive value called the Fermi energy, 
& ® 7.595 To at T— 0. On the contrary, the chemical potential of a bosonic gas stays negative, and then 
turns into zero at a certain critical temperature T, ~ 3.313 To. Both these limits, which are very 
important for applications, may (and will be :-) explored analytically, separately for each statistics. 


Fig. 3.1. The chemical potential of an ideal 
gas of N >> 1 indistinguishable quantum 
particles, as a function of temperature at a 
fixed gas density n = N/V (i.e. fixed Ty « n””), 
for two different particle types. The dashed 
line shows the classical approximation (46), 
valid only at T >> To. 


T/T, 


Before carrying out such studies (in the next two sections), let me show that, rather surprisingly, 
for any non-relativistic, ideal quantum gas, the relation between the product PV and the energy, 


Pv =SE, (3.48) 


is exactly the same as follows from Eqs. (18) and (19) for the classical gas, and hence does not depend 
on the particle statistics. To prove this, it is sufficient to use Eqs. (2.114) and (2.117) for the grand 
thermodynamic potential of each quantum state, which may be conveniently represented by a single 
formula, 


Q, =Frin(ite "7 ), (3.49) 


and sum them over all states k, using the general summation formula (13). The result for the total grand 
potential of a 3D gas with the dispersion law (3) is 


Q=FT 


cap hints an ap’ dp = ee jr finlite (HeVIT 12 de. (3,50) 
AN) 9 


Working out this integral by parts, exactly as we did it with the one in Eq. (2.90), we get 


2 gm + 6 de 27 
5a “Panth® | emIT 41 73) 888MM) ae: (3.51) 
= 0 


But the last integral is just the total energy E of the gas: 


lelP-HVT 44 207A? 4 


elE MIT 44 


V -p?  Anp?d Vm? 2% 32g w 
: \- ar gVm |r = fess @le)ae. (3.52) 
i 0 
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so that for any temperature and any particle type, Q = (2/3). But since, from thermodynamics, Q = — 
PV, we have Eq. (48) proved. This universal relation22 will be repeatedly used below. 


3.3. Degenerate Fermi gas 


Analysis of low-temperature properties of a Fermi gas is very simple in the limit T= 0. Indeed, 
in this limit, the Fermi-Dirac distribution (2.115) is just the step function: 


1, for e<y, 


(Nle)) fa for u<é, G9) 

- see by the bold line in Fig. 2a. Since ¢ = p’/2m is isotropic in the momentum space, in that space the 
particles, at T= 0, fully occupy all possible quantum states inside a sphere (frequently called either the 
Fermi sphere or the Fermi sea) with some radius pr (Fig. 2b), while all states above the sea surface are 
empty. Such degenerate Fermi gas is a striking manifestation of the Pauli principle: though in 
thermodynamic equilibrium at T= 0 all particles try to lower their energies as much as possible, only g 
of them may occupy each translational (“orbital”) quantum state. As a result, the sphere’s volume is 
proportional to the particle number N, or rather to their density n = N/V. 


Fig. 3.2. Representations of the 
Fermi sea: (a) on the Fermi 
distribution plot, and (b) in the 
momentum space. 


Indeed, the radius pp may be readily related to the number of particles N using Eq. (39), with the 
upper sign, whose integral in this limit is just the Fermi sphere’s volume: 


Pr 
eV 2 gV An 
N= 4 dp = =p. 3.54 


Now we can use Eq. (3) to express via N the chemical potential zz (which, in the limit 7’ = 0, it bears the 
special name of the Fermi energy é)?3: 


(3.55a) 


where 7) is the quantum temperature scale defined by Eq. (35). This formula quantifies the low- 
temperature trend of the function 47), clearly visible in Fig. 1, and in particular, explains the ratio ¢/To 
mentioned in Sec. 2. Note also a simple and very useful relation, 


22 For gases of diatomic and polyatomic molecules at relatively high temperatures, when some of their internal 
degrees of freedom are thermally excited, Eq. (48) is valid only for the translational-motion energy. 
23 Note that in the electronic engineering literature, is usually called the Fermi level, for any temperature. 
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ee eee ae Le; gaye (3.55b) 
2 23(é;) 2 €; 


that may be obtained immediately from the comparison of Eqs. (43) and (54). 
The total energy of the degenerate Fermi gas may be (equally easily) calculated from Eq. (52): 


Pr 9 5 
A 
B=—8) _ (2 ag dp = 2 ES ie (3.56) 
(22h) 4 2m (22ny 2m 5 5 


showing that the average energy, (¢) = E/N, of a particle inside the Fermi sea is equal to 3/5 of that (é) 
of the particles in the most energetic occupied states, on the Fermi surface. Since, according to the 
formulas of Chapter 1, at zero temperature H = G = Nu, and F'= E, the only thermodynamic variable 
still to be calculated is the gas pressure P. For it, we could use any of the thermodynamic relations P = 
(H — E)/V or P =-(OF/OV)r, but it is even easier to use our recent result (48). Together with Eq. (56), it 
yields 


1/3 
2 E 2 4 pee 8 i} 
peel et. Re) 28) aie... hes Sat. Sees (3.57) 
BV 2 0 125 mg 
From here, it is straightforward to calculate the bulk modulus (reciprocal compressibility),** 
K= {= = ee (3.58) 
GV te Be 


which may be simpler to measure experimentally than P. 


Perhaps the most important example*> of the degenerate Fermi gas is the conduction electrons in 
metals — the electrons that belong to outer shells of the isolated atoms but become shared in solid metals, 
and as a result, can move through the crystal lattice almost freely. Though the electrons (which are 
fermions with spin s = 2 and hence with the spin degeneracy g = 2s + | = 2) are negatively charged, the 
Coulomb interaction of the conduction electrons with each other is substantially compensated by the 
positively charged ions of the atomic lattice, so that they follow the simple model discussed above, in 
which the interaction is disregarded, reasonably well. This is especially true for alkali metals (forming 
Group | of the periodic table of elements), whose experimentally measured Fermi surfaces are spherical 
within 1% — even within 0.1% for Na. 


Table | lists, in particular, the experimental values of the bulk modulus for such metals, together 
with the values given by Eq. (58) using the ¢ calculated from Eq. (55) with the experimental density of 
the conduction electrons. The agreement is pretty impressive, taking into account that the simple theory 


24 For a general discussion of this notion, see, e.g., CM Eqs. (7.32) and (7.36). 

25 Recently, nearly degenerate gases (with é ~ 57) have been formed of weakly interacting Fermi atoms as well — 
see, e.g., K. Aikawa et al., Phys. Rev. Lett. 112, 010404 (2014), and references therein. Another interesting 
example of the system that may be approximately treated as a degenerate Fermi gas is the set of Z >> 1 electrons 
in a heavy atom. However, in this system the account of electron interaction via the electrostatic field they create 
is important. Since for this Thomas-Fermi model of atoms, the thermal effects are unimportant, it was discussed 
already in the quantum-mechanical part of this series (see QM Chapter 8). However, its analysis may be 
streamlined using the notion of the chemical potential, introduced only in this course — the problem left for the 
reader’s exercise. 
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described above completely ignores the Coulomb and exchange interactions of the electrons. This 
agreement implies that, surprisingly, the experimentally observed rigidity of solids (or at least metals) is 
predominantly due to the kinetic energy (3) of the conduction electrons, rather than any electrostatic 
interactions — though, to be fair, these interactions are the crucial factor defining the equilibrium value 
of n. Numerical calculations using more accurate approximations (e.g., the Density Functional 
Theory”°), which agree with experiment with a few-percent accuracy, confirm this conclusion.’ 


Table 3.1. Experimental and theoretical parameters of electrons’ Fermi sea in some alkali metals8 


Metal (eV)  K (GPa) K(GPa) —_y(meal/mole-K’) _y(mcal/mole-K’) 


Eq. (55) Eq. (58) experiment Eq. (69) experiment 
Na 3.24 923 642 0.26 0.35 
K 2.12 319 281 0.40 0.47 
Rb 1.85 230 192 0.46 0.58 
Cs 1.59 154 143 0.53 0.77 


Looking at the values of é listed in this table, note that room temperatures (7x ~ 300 K) 
correspond to 7 ~ 25 meV. As a result, virtually all experiments with metals, at least in their solid or 
liquid form, are performed in the limit T << g. According to Eq. (39), at such temperatures, the 
occupancy step described by the Fermi-Dirac distribution has a non-zero but relatively small width of 
the order of T— see the dashed line in Fig. 2a. Calculations for this case are much facilitated by the so- 
called Sommerfeld expansion formula?’ for the integrals like those in Eqs. (41) and (52): 


_f _f m7 AQ(M) 
I(T) = J pe) N(e)\de = J o(e)de+——T ae? for T <<, (3.59) 


where @é) is an arbitrary function that is sufficiently smooth at ¢ = wand integrable at ¢= 0. To prove 
this formula, let us introduce another function, 


é 
fle) =f olede', sothat (e)= athe) (3.60) 
0 
and work out the integral /(7) by parts: 


I(T)= [LE v(e)ae = 


é=00 


fiwle)ar 


€= 


26 See, e.g., QM Sec. 8.4. 

27 Note also a huge difference between the very high bulk modulus of metals (K ~ 10'' Pa) and its very low values 
in usual, atomic gases (for them, at ambient conditions, K ~10° Pa). About four orders of magnitude of this 
difference is due to that in the particle density N/V, but the balance is due to the electron gas’ degeneracy. Indeed, 
in an ideal classical gas, K = P = T(N/V), so that the factor (2/3)é in Eq. (58), of the order of a few eV in metals, 
should be compared with the factor T + 25 meV in the classical gas at room temperature. 

28 Data from N. Ashcroft and N. D. Mermin, Solid State Physics, W. B. Saunders, 1976. 

29 Named after Arnold Sommerfeld, who was the first (in 1927) to apply quantum mechanics to degenerate Fermi 
gases, in particular to electrons in metals, and may be credited for most of the results discussed in this section. 
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-[oteyols -[reawtey=fre}-2 ee. sn 


As evident from Eq. (2.115) and/or Fig. 2a, at T << yw the function —O(M(é))/0é is close to zero for all 
energies, besides a narrow peak of the unit area, at ¢ = wz. Hence, if we expand the function f(é) in the 
Taylor series near this point, just a few leading terms of the expansion should give us a good 
approximation: 


WT) = [ores Fy _le-a)+t 4) ule mal Nc 


(3.62) 


1d) Fie uy [AEP he 
0€ 


Zt ah = 


In the last form of this relation, the first integral over ¢ equals (M(é = 0)) — (We = «) = 1, the second 
one vanishes (because the function under it is antisymmetric with respect to the point ¢ = yz), and only 
the last one needs to be dealt with explicitly, by working it out by parts and then using a table integral:3° 


»| ONE) |, a2 Pe d 1 en eee 
(e | a tear Je S{-paae=ar?f 47>. (3.63) 


+1 era 


Cota 8 


Being plugged into Eq. (62), this result proves the Sommerfeld formula (59). 


The last preparatory step we need to make is to account for a possible small difference (as we 
will see below, also proportional to 7°) between the temperature-dependent chemical potential s( 7) and 
the Fermi energy defined as ¢ = (0), in the largest (first) term on the right-hand side of Eq. (59), to 


write 
ér 


(I) = | ole nde + (ue, (un + “7? AOL = 1(0) + (ue, )OH) +7 “7? “oe G64) 


Now, applying this formula to Eq. (41) and the last form of Eq. (52), we get the following results 
(which are valid for any dispersion law «(p) and even any eae of the gas): 


N(D) = N(O)+ (ue, Jee) + 7? 0), (3.65) 
ie hadiaetebiad (3.66) 


If the number of particles does not change with temperature, M(7) = N(0), as in most experiments, Eq. 
(65) gives the following formula for finding the temperature-induced change of zz 


30 See, e.g., MA Eqs. (6.8c) and (2.12b), with n = 1. 
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2 
ge 1 ey) (3.67) 
6 glu) du 
Note that the change is quadratic in 7 and negative, in agreement with the numerical results shown with 
the red line in Fig. 1. Plugging this expression (which is only valid when the magnitude of the change is 
much smaller than éf) into Eq. (66), we get the following temperature correction to the energy: 


fe = 


2 
1 
E(T)- EO) =—— sw’, (3.68) 
where within the accuracy of our approximation, 4: may be replaced with é. (Due to the universal 


relation (48), this result also gives the temperature correction to the Fermi gas’ pressure.) Now we may 
use Eq. (68) to calculate the heat capacity of the degenerate Fermi gas: 


(3.69) 


According to Eq. (55b), in the particular case of a 3D gas with the isotropic and parabolic dispersion law 
(3), Eq. (69) reduces to 


y=, 1.e. cy =— =—_— <<], (3.70) 


This important result deserves a discussion. First, note that within the range of validity of the 
Sommerfeld approximation (T << é), the specific heat of the degenerate gas is much smaller than that 
of the classical gas, even without internal degrees of freedom: cy = 3/2 — see Eq. (19). The physical 
reason for such a low heat capacity is that the particles deep inside the Fermi sea cannot pick up thermal 
excitations with available energies of the order of T << &, because the states immediately above them 
are already occupied. The only particles (or rather quantum states, due to the particle 
indistinguishability) that may be excited with such small energies are those at the Fermi surface, more 
exactly within a surface layer of thickness Agé ~ T << é, and Eq. (70) presents a very vivid manifestation 
of this fact. 


The second important feature of Eqs. (69)-(70) is the linear dependence of the heat capacity on 
temperature, which decreases with a reduction of T much slower than that of crystal vibrations — see Eq. 
(2.99). This means that in metals the specific heat at temperatures T << Tp is dominated by the 
conduction electrons. Indeed, experiments confirm not only the linear dependence (70) of the specific 
heat,3! but also the values of the proportionality coefficient y = Cy/T for cases when é& can be calculated 
independently, for example for alkali metals — see the two rightmost columns of Table 1 above. More 
typically, Eq. (69) is used for the experimental measurement of the density of states on the Fermi 
surface, g(é:) — the factor which participates in many theoretical results, in particular in transport 
properties of degenerate Fermi gases (see Chapter 6 below). 


31 Solids, with their low thermal expansion coefficients, provide a virtually-fixed-volume confinement for the 
electron gas, so that the specific heat measured at ambient conditions may be legitimately compared with the 
calculated cy. 
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3.4. Bose-Einstein condensation 


Now let us explore what happens at the cooling of an ideal gas of bosons. Figure 3a shows the 
same plot as Fig. 1b, i.e. the result of a numerical solution of Eq. (47) with the appropriate (lower) sign 
in the denominator, on a more appropriate, log-log scale. One can see that the chemical potential 
indeed tends to zero at some finite “critical temperature” 7,. This temperature may be found by taking w 
= 0 in Eq. (47), reducing it to a table integral:32 


—2/3 2/3 i 
_r|_1 feds) _ | 1 f3)-(3 orial 
T. -7| [5% =“ Be 2 6 ,) eee (3-77) temperature 


the result explaining the 7,/7o ratio mentioned in Sec. 2 and indicated in Fig. 1. 


(b) 
100 
10 
ae 
i. 
0.1 
Tit, 
Fig. 3.3. The Bose-Einstein condensation: 
an (a) the chemical potential of the gas and (b) 
10 100 its pressure, as functions of temperature. The 
3.313 Lit, dashed line corresponds to the classical gas. 


Let us have a good look at the temperature interval 0 < T < 7, which cannot be directly 
described by Eq. (40) (with the appropriate negative sign in the denominator), and hence may look 
rather mysterious. Indeed, within this range, the chemical potential z, cannot either be negative or equal 
zero, because according to Eq. (71), in this case, Eq. (40) would give a value of N smaller than the 
number of particles we actually have. On the other hand, ~ cannot be positive either, because the 
integral (40) would diverge at ¢ > w due to the divergence of (M(é)) — see, e.g., Fig. 2.15. The only 
possible resolution of the paradox, suggested by A. Einstein in 1925, is as follows: at T < T,, the 
chemical potential of each particle of the system still equals exactly zero, but a certain number (No of N) 
of them are in the ground state (with ¢ = p’/2m = 0), forming the so-called Bose-Einstein condensate, 
usually referred to as the BEC. Since the condensate particles do not contribute to Eq. (40) (because of 


32 See, e.g., MA Eq. (6.8b) with s = 3/2, and then Eqs. (2.7b) and (6.7e). 
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the factor <'? = 0), their number No may be calculated by using that formula (or, equivalently, Eq. (44)), 
with 4 = 0, to find the number (N — No) of particles still remaining in the gas, i.e. having energy € > 0: 


NN, = SV cn)" ods (3.72) 
: V207h? 0 ef —l 


This result is even simpler than it may look. Indeed, let us write it for the case T= T,, when No = 0:3 


7 eV (mT,)°” re de 


Vr he. = eo a1 uo 
Dividing both sides of Eqs. (72) and (73), we get an extremely simple and elegant result: 
3/2 3/2 
N-N, -(F , so that w=afi-(F) | forT <T,. (3.74a) 
N fe f 


Please note that this result is only valid for the particles whose motion, within the volume J/, is 
free — in other words, for a system of free particles confined within a rigid-wall box of volume V. In 
most experiments with the Bose-Einstein condensation of dilute gases of neutral (and hence very weakly 
interacting) atoms, they are held not in such a box, but at the bottom of a “soft” potential well, which 
may be well approximated by a 3D quadratic parabola: U(r) = marr’/2. It is straightforward (and hence 
left for the reader’s exercise) to show that in this case, the dependence of No(7) is somewhat different: 


Cc 


3 
v,=ai-[ | , forT<T., (3.74b) 
7 


where 7. is a different critical temperature, which now depends on fia, i.e. on the confining potential’s 
“steepness’’. (In this case, V is not exactly fixed; however, the effective volume occupied by the particles 
at T= T, is related to this temperature by a formula close to Eq. (71), so that all estimates given above 
are still valid.) Figure 4 shows one of the first sets of experimental data for the Bose-Einstein 
condensation of a dilute gas of neutral atoms. Taking into account the finite number of particles in the 
experiment, the agreement with the simple theory is surprisingly good. 


Returning to the spatially-uniform Bose system, let us explore what happens below the critical 
temperature with its other parameters. Formula (52) with the appropriate (lower) sign shows that 
approaching 7, from higher temperatures, the gas energy and hence its pressure do not vanish — see the 
red line in Fig. 3b. Indeed, at 7= T, (where = 0), that formula yields+4 


322 © 3/2 Rey Beg 
E(T.) = gV~—— fe Lm als _ {Fk (5 | ~ 0.7701 NT., (3.75) 
J2n7h? 4 eo =1 27h Dip Ned 

so that using the universal relation (48), we get the pressure value, 


p(r,)=2 Eo) _ 6612) N 
"3 7  €EBI2V 


T, ~ 0.5134 or. ~1.701P,, (3.76) 


33 This is, of course, just another form of Eq. (71). 
34 For the involved dimensionless integral see, e.g., MA Eqs. (6.8b) with s = 5/2, and then (2.7b) and (6.7c). 
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which is somewhat lower than, but comparable to P(0) for the fermions — cf. Eq. (57). 


Fig. 3.4. The total number N of trapped *’Rb 
atoms (inset) and their ground-state fraction 
N)/N, as functions of the ratio 7/T,, as measured 
in one of the pioneering experiments — see J. 
Ensher et al., Phys. Rev. Lett. 77, 4984 (1996). In 
this experiment, 7," was as low as 0.28x10° K. 
The solid line shows the simple theoretical 
dependence M7) given by Eq. (74b), while other 
lines correspond to more detailed theories taking 
into account the finite number N of trapped 
0.0 0.5 1.0 1.5 atoms. © 1996 APS, reproduced with permission. 
TT 


Now we can use the same Eq. (52), also with = 0, to calculate the energy of the gas at T< 7, 


Sapte io) 3/2 7 
= gV— 243 : . 
J2x h 0 ef —l 
Comparing this relation with the first form of Eq. (75), which features the same integral, we 
immediately get one more simple temperature dependence: 


E(T) (3.77) 


(3.78) BEC: 


energy 


(3.79) BEC: 


pressure 


This temperature dependence of pressure is shown with the blue line in Fig. 3b. The plot shows that for 
all temperatures (both below and above 7,) the pressure is lower than that of the classical gas of the 
same density. Now note also that since, according to Eqs. (57) and (76), P(T:) « Py « V>°, while 
according to Eqs. (35) and (71), T; « Ty « V°, the pressure (79) is proportional to V°2/(V7y°? = VP, 
i.e. does not depend on the volume at all! The physics of this result (which is valid at T< T, only) is that 
as we decrease the volume at a fixed total number N of particles, more and more of them go to the 
condensate, decreasing the number (N — No) of particles in the gas phase, but not changing its spatial 
density pressure. Such behavior is very typical for the coexistence of two different phases of the same 
matter — see, in particular, the next chapter. 


The last thermodynamic variable of major interest is heat capacity, because it may be most 
readily measured. For temperatures T < 7,, it may be easily calculated from Eq. (78): 


OE oi lis 
cry=(&) = EC) ae (3.80) 
NV c 
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so that below 7,, the capacity increases with temperature, at the critical temperature reaching the value 


C, (Z.) me 
2 Cc 

which is approximately 28% above that (3/2) of the classical gas. (As a reminder, in both cases we 
ignore possible contributions from the internal degrees of freedom.) The analysis for T= T; is a little bit 
more cumbersome because differentiating E over temperature — say, using Eq. (52) — one should also 
take into account the temperature dependence of yw that follows from Eq. (40) — see also Fig. 1. 
However, the most important feature of the result may be predicted without the calculation (which is 
being left for the reader’s exercise). Namely, since at T >> T, the heat capacity has to approach the 
classical value 1.5N, starting from the value (81), it must decrease with temperature at T > 7), thus 
forming a sharp maximum (a “cusp”’) at the critical point T= T, — see Fig. 5. 


~1.925 N, (3.81) 


3 
3.313 
2.5 | 
21.925 —» 
Cy 
a oS ee 
1 
Fig. 3.5. Temperature dependences of the heat 
ue capacity of an ideal Bose-Einstein gas, 
numerically calculated from Eqs. (52) and (40) 
0 2 4 6 8 0 for T= T,, and given by Eq. (80) for T < T,. 
rit, 


Such a cusp is a good indication of the Bose-Einstein condensation in virtually any experimental 
system, especially because inter-particle interactions (unaccounted for in our simple discussion) 
typically make this feature even more substantial, frequently turning it into a weak (logarithmic) 
singularity. Historically, such a singularity was the first noticed, though not immediately understood 
sign of the Bose-Einstein condensation, observed in 1931 by W. Keesom and K. Clusius in liquid “He at 
its A-point (called so exactly because of the characteristic shape of the C,{7) dependence) T = T, = 2.17 
K. Other milestones of the Bose-Einstein condensation studies include: 


- the experimental discovery of superconductivity (which was later explained as the result of the 
Bose-Einstein condensation of electron pairs) by H. Kamerlingh-Onnes in 1911; 


- the development of the Bose-Einstein statistics, and predicting the condensation, by S. Bose 
and A. Einstein, in 1924-1925; 


- the discovery of superfluidity in liquid “He by P. Kapitza and (independently) by J. Allen and 
D. Misener in 1937, and its explanation as a result of the Bose-Einstein condensation by F. and H. 
Londons and L. Titza, with further significant elaborations by L. Landau — all in 1938; 


- the explanation of superconductivity as a result of electron binding into Cooper pairs, with a 
simultaneous Bose-Einstein condensation of the resulting bosons, by J. Bardeen, L. Cooper, and J. 
Schrieffer in 1957; 
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- the discovery of superfluidity of two different phases of *He, due to the similar Bose-Einstein 
condensation of pairs of its fermion atoms, by D. Lee, D. Osheroff, and R. Richardson in 1972; 


- the first observation of the Bose-Einstein condensation in dilute gases (*’Ru by E. Cornell, C. 
Wieman, et al., and *Na by W. Ketterle et al.) in 1995. 


The importance of the last achievement stems from the fact that in contrast to other Bose- 
Einstein condensates, in dilute gases (with the typical density n as low as ~ 10'* cm”) the particles 
interact very weakly, and hence many experimental results are very close to the simple theory described 
above and its straightforward elaborations — see, e.g., Fig. 4.35 On the other hand, the importance of 
other Bose-Einstein condensates, which involve more complex and challenging physics, should not be 
underestimated — as it sometimes is. 


Perhaps the most important feature of any Bose-Einstein condensate is that all No condensed 
particles are in the same quantum state, and hence are described by exactly the same wavefunction. This 
wavefunction is substantially less “feeble” than that of a single particle — in the following sense. In the 
second quantization language,*° the well-known Heisenberg’s uncertainty relation may be rewritten for 
the creation/annihilation operators; in particular, for bosons, 


546a\|>1 (3.82) 


Since @ and 4! are the quantum-mechanical operators of the complex amplitude a = Aexp {ig} and its 
complex conjugate a* = Aexp {-ig}, where A and @ are real amplitude and phase of the wavefunction, 
Eq. (82) yields the following approximate uncertainty relation (strict in the limit dg << 1) between the 
number of particles NV = 4A* and the phase g: 


SNS >". (3.83) 


This means that a condensate of N >> 1 bosons may be in a state with both phase and amplitude 
of the wavefunction behaving virtually as c-numbers, with very small relative uncertainties: dN << N, 
og << 1. Moreover, such states are much less susceptible to perturbations by experimental instruments. 
For example, the electric current carried along a superconducting wire by a coherent Bose-Einstein 
condensate of Cooper pairs may be as high as hundreds of amperes. As a result, the “strange” behaviors 
predicted by the quantum mechanics are not averaged out as in the usual particle ensembles (see, e.g., 
the discussion of the density matrix in Sec. 2.1), but may be directly revealed in macroscopic, 
measurable dynamics of the condensate. 


For example, the density j of the electric “supercurrent” of the Cooper pairs may be described by 
the same formula as the well-known usual probability current density of a single quantum particle,” just 
multiplied by the electric charge g =—2e of a single pair, and the pair density n: 


35 Such controllability of theoretical description has motivated the use of dilute-gas BECs for modeling of 
renowned problems of many-body physics — see, e.g. the review by I. Bloch et al., Rev. Mod. Phys. 80, 885 
(2008). These efforts are assisted by the development of better techniques for reaching the necessary sub-uwK 
temperatures — see, e.g., the recent work by J. Hu et al., Science 358, 1078 (2017). For a more general, detailed 
discussion see, e.g., C. Pethick and H. Smith, Bose-Einstein Condensation in Dilute Gases, 2" ed., Cambridge U. 
Press, 2008. 

36 See, e.g., QM Sec. 8.3. 

37 See, e.g., QM Eq. (3.28). 
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j=a"[vo-£a), (3.84) 


where A is the vector potential of the (electro)magnetic field. If a superconducting wire is not extremely 
thin, the supercurrent does not penetrate into its interior.*8 As a result, the integral of Eq. (84), taken 
along a closed superconducting loop, inside its interior (where j = 0), yields 


Spas dr = Ag = 20M, (3.85) 
Cc 


where MM is an integer. But, according to the basic electrodynamics, the integral on the left-hand side of 
this relation is nothing more than the flux © of the magnetic field B piercing the wire loop area A. Thus 
we immediately arrive at the famous magnetic flux quantization effect: 


@=[Bd?r=Mo,, — where ®, = 29.07 x10 Wh, (3.86) 
A 


|| 


which was theoretically predicted in 1950 and experimentally observed in 1961. Amazingly, this effect 
holds even “over miles of dirty lead wire”, citing H. Casimir’s famous expression, sustained by the 
coherence of the Bose-Einstein condensate of Cooper pairs. 


Other prominent examples of such macroscopic quantum effects in Bose-Einstein condensates 
include not only the superfluidity and superconductivity as such, but also the Josephson effect, 
quantized Abrikosov vortices, etc. Some of these effects are briefly discussed in other parts of this 
series.? 


3.5. Gases of weakly interacting particles 


Now let us discuss the effects of weak particle interaction effects on properties of their gas. 
(Unfortunately, I will have time to do that only very briefly, and only for classical gases.4°) In most 
cases of interest, particle interaction may be well described by a certain potential energy U, so that in the 
simplest model, the total energy is 


N 2 
Bo Ul estieosty) (3.87) 


where rz; is the radius-vector of the k" particle’s center.‘! First, let us see how far would the statistical 
physics allow us to proceed for an arbitrary potential U. For N >> 1, at the calculation of the Gibbs 
statistical sum (2.59), we may perform the usual transfer from the summation over all quantum states of 
the system to the integration over the 6N-dimensional space, with the correct Boltzmann counting: 


38 This is the Meissner-Ochsenfeld (or just “Meissner”) effect which may be also readily explained using Eq. (84) 
combined with the Maxwell equations — see, e.g., EM Sec. 6.4. 

39 See EM Secs. 6.4-6.5, and QM Secs. 1.6 and 3.1. 

40 A concise discussion of the effects of weak interactions on the properties of quantum gases may be found, for 
example, in Chapter 10 of the textbook by K. Huang, Statistical Mechanics, 2" ed., Wiley, 2003. 

41 One of the most significant effects neglected by Eq. (87) is the influence of atomic/molecular angular 
orientations on their interactions. 


Chapter 3 Page 23 of 34 


Essential Graduate Physics SM: Statistical Mechanics 


N 2 
Pj > 
z=Sie™ NI Gay a9 See b nae e)- aa ae aa 


m k=1 


1 giv" YP; 1 o(( sae 9 | ee 
oa ! (27) pr Jes | ae “He P|. d ns je Jey T \a Taal rs 


But according to Eq. (14), the first operand in the last product is just the statistical sum of an ideal gas 
(with the same g, N, V, and 7), so that we may use Eq. (2.63) to write 


(3.88) 


Piaf + Te fa rnd'n, (eT -1) , (3.89) 


ideal 


I UIT | 
F = F seat Pia] AJP ind ye =F. 


where Fideai is the free energy of the ideal gas (i.e. the same gas but with U = 0), given by Eq. (16). 


I believe that Eq. (89) is a very convincing demonstration of the enormous power of statistical 
physics methods. Instead of trying to solve an impossibly complex problem of classical dynamics of NV 
>> 1 (think of N ~ 107%) interacting particles, and only then calculating appropriate ensemble averages, 
the Gibbs approach reduces finding the free energy (and then, from thermodynamic relations, all other 
thermodynamic variables) to the calculation of just one integral on its right-hand side of Eq. (89). Still, 
this integral is 3N-dimensional and may be worked out analytically only if the particle interactions are 
weak in some sense. Indeed, the last form of Eq. (89) makes it especially evident that if U > 0 
everywhere, the term in the parentheses under the integral vanishes, and so does the integral itself, and 
hence the addition to Figeal. 


Now let us see what would this integral yield for the simplest, short-range interactions, in which 
the potential U is substantial only when the mutual distance rj = rj; — r;~ between the centers of two 
particles is smaller than certain value 279, where ro may be interpreted as the particle’s radius. If the gas 
is sufficiently dilute, so that the radius ro is much smaller than the average distance raye between the 
particles, the integral in the last form of Eq. (89) is of the order of (2ro)*”, i.e. much smaller than (Fave) 
= V". Then we may expand the logarithm in that expression into the Taylor series with respect to the 
small second term in the square brackets, and keep only its first non-zero term: 

Fer ~fd'n.d'r (_-U/T -1). (3.90) 
ideal V N 1 N 

Moreover, if the gas density is so low, the chances for three or more particles to come close to 

each other and interact (collide) simultaneously are typically very small, so that pair collisions are the 


most important. In this case, we may recast the integral in Eq. (90) as a sum of MN -— 1)/2 ~ N’/2 
similar terms describing such pair interactions, each of the type 


yr Gas : -1\d*nd*1, (3.91) 


It is convenient to think about the rj: = r, — rz as the radius-vector of the particle number & in the 
reference frame with the origin placed at the center of the particle number k’ — see Fig. 6a. 
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(a) (b) 


Fig. 3.6. The definition of the 
interparticle distance vectors 
at their (a) pair and (b) triple 
interactions. 


particle k 


particle k’ 


Then in Eq. (91), we may first calculate the integral over r;’, while keeping the distance vector 
rx’, and hence U(rjx’), constant, getting one more factor V. Moreover, since all particle pairs are similar, 
in the remaining integral over rj,’ we may drop the radius-vector’s index, so that Eq. (90) becomes 

T N°’ a ig 
F = Fay - Sate Oye ~1)a3r = Fra +7 NB), (3.92) 
where the function B(7), called the second virial coefficient, has an especially simple form for 
spherically-symmetric interactions: 
Second 


virial 
coefficient 


B(T) = I be VOT |, *f4mar(t Vr) (3.93) 
0 


From Eq. (92), and the second of the thermodynamic relations (1.35), we already know something 
particular about the equation of state P(V, T): 


2 2 

pa) = rage Pacry=7]¥ ras] 6.94 
OV Jo V V V 

We see that at a fixed gas density n = N/V, the pair interaction creates additional pressure, proportional 

to (N/V) =n’ and a function of temperature, B(7)T. 


Let us calculate B(7) for a few simple models of particle interactions. The solid curve in Fig. 7 
shows (schematically) a typical form of the interaction potential between electrically neutral 
atoms/molecules. At large distances the interaction of particles that do not their own permanent 
electrical dipole moment p, is dominated by the attraction (the so-called London dispersion force) 
between the correlated components of the spontaneously induced dipole moments, giving U(r) > r° at r 
— o,43 At closer distances the potential is repulsive, growing very fast at r — 0, but its quantitative 


42 The term “virial”, from Latin viris (meaning “force”), was introduced to molecular physics by R. Clausius. The 
motivation for the adjective “second” for B(T) is evident from the last form of Eq. (94), with the “first virial 
coefficient”, standing before the N/V ratio and sometimes denoted A(7), equal to 1 — see also Eq. (100) below. 

43 Indeed, independent fluctuation-induced components p(f) and p(t) of dipole moments of two particles have 
random mutual orientation, so that the time average of their interaction energy, proportional to p(é)-p (d)/r’, 
vanishes. However, the electric field & of each dipole p, proportional to 7°’, induces a correlated component of p’, 
also proportional to r°, giving interaction energy proportional to p’-€ « 7°, with a non-zero statistical average. 
Quantitative discussions of this effect, within several models, may be found, for example, in QM Chapters 3, 5, 
and 6. 
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form is specific for particular atoms/molecules.** The crudest description of such repulsion is given by 
the so-called hardball model: 


+o, for0<r< 2n, 
U(r)= (3.95) 
0, for 27, <r <0, 
— see the dashed line and the inset in Fig. 7. 
U(r) 
Fig. 3.7. Pair interactions of particles. 
Solid line: a typical interaction potential; 
0 dashed line: its hardball model (95); 
dash-dotted line: the improved model 
if (97) — all schematically. The inset 
a min illustrates the hardball model’s physics. 
0 
As Eq. (93) shows, in this model the second virial coefficient is temperature-independent: 
I Pacer. Dain se Nae 
BT) =b=— [4dr = 20) =4Y,, where Vy =n, (3.96) 
0 


so that the equation of state (94) still gives a linear dependence of pressure on temperature. 


A correction to this result may be obtained by the following approximate account of the long- 
range attraction (see the dash-dotted line in Fig. 7):45 


+0, for0<r< 2n, 3.97 
= U(r), with |U |<< T, for 2r, <r<o. ea) 
For this improved model, Eq. (93) yields: 
B(T) a fama ZO 29-2, with a =2n [rPdrlU(r). (3.98) 
2 2K r ie 2% 


In this model, the equation of state (94) acquires a temperature-independent term: 


44 Note that the particular form of the first term in the approximation U(r) = a/r'? — b/r® (called either the 
Lennard-Jones potential or the “12-6 potential”), that had been suggested in 1924, lacks physical justification, 
and in professional physics was soon replaced with other approximations, including the so-called exp-6 model, 
which fits most experimental data much better. However, the Lennard-Jones potential still keeps creeping from 
one undergraduate textbook to another one, apparently for a not better reason than enabling a simple analytical 
calculation of the equilibrium distance between the particles at T— 0. 

45 The strong inequality |U| << T in this model is necessary not only to make the calculations simpler. A deeper 
reason is that if (-Umin) becomes comparable with T, particles may become trapped in this potential well, forming 
a different phase — a liquid or a solid. In such phases, the probability of finding more than two particles interacting 
simultaneously is high, so that Eq. (92), on which Eqs. (93)-(94) and Eqs. (98)-(99) are based, becomes invalid. 
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Still, the correction to the ideal-gas pressure is proportional to (N/ V)’ and has to be relatively small for 
this result to be valid. 

Generally, the right-hand side of Eq. (99) may be considered as the sum of two leading terms in 
the general expansion of P into the Taylor series in the density n = N/V of the gas: 


N NY NY 
pai X+a@(2] rer) vo} (3.100) 


where C(7) is called the third virial coefficient. It is natural to ask how can we calculate C(7) and the 
higher virial coefficients. This may be done, first of all, just by a careful direct analysis of Eq. (90),*° but 
I would like to use this occasion to demonstrate a different, very interesting and counter-intuitive 
approach, called the cluster expansion method," which allows streamlining such calculations. 


Let us apply to our system, with the energy given by Eq. (87), the grand canonical distribution. 
(Just as in Sec. 2, we may argue that if the average number (JN) of particles in a member of a grand 
canonical ensemble, with fixed and 7, is much larger than 1, the relative fluctuations of N are small, 
so that all its thermodynamic properties should be similar to those when N is exactly fixed.) For our 
current case, Eq. (2.109) takes the form 


2 ee wi sae 
Q=-TindZ,, with Zy =e Ye mv" | OB = FEU inst): (3.101) 
m 


N=0 m k=l 


(Notice that here, as at all discussions of the grand canonical distribution, NV means a particular rather 
than the average number of particles.) Now let us try to forget for a minute that in real systems of 
interest the number of particles is extremely large, and start to calculate, one by one, the first terms Zy. 


In the term with N = 0, both contributions to E,,. vanish, and so does the factor “N/T, so that 
Z, =1. In the next term, with N= 1, the interaction term vanishes, so that £,,; is reduced to the kinetic 
energy of one particle, giving 


2 
Z,=eH'TS expi- Pt | 3.102 
1 2, P| mT ( ) 
Making the usual transition from the summation to integration, we may write 
V Pp 
Z,=ZI,, where Zaer? & ex d°p, and I, =1. 3.103 
1 1 (20h) } p ImnT Pp 1 ( ) 


This is the same simple (Gaussian) integral as in Eq. (6), giving 


46 L. Boltzmann has used that way to calculate the 3“ and 4" virial coefficients for the hardball model — as much 
as can be done analytically. 

47 This method was developed in 1937-38 by J. Mayer and collaborators for the classical gas, and generalized to 
quantum systems in 1938 by B. Kahn and G. Uhlenbeck. 
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Z=eH!T _8Y (opm? = ott oof BE (3.104) 
(2a) 2ah* 


Now let us explore the next term, with N = 2, which describes, in particular, pair interactions U = 
U(r), with r = r — r’. Due to the assumed particle indistinguishability, this term needs the “correct 
Boltzmann counting” factor 1/2! — cf. Eqs. (12) and (88): 


2 2 
_ mw/T 1 Pr Pr —U(r)/T 
4, = — exp, —- ——_ - —— , 3.105 
— | | 2mT fie | Ss 


Since U is coordinate-dependent, here the transfer from the summation to integration should be done 
more carefully than in the first term — cf. Eqs. (24) and (88): 


_ anit 1 (gV)? p | P| sao $e ss 
L= d —_—+—'td x— dr. 3.106 
2" 2 (amny JOP Spit Paarl eel aa 


Comparing this expression with the Eq. (104) for the parameter Z, we get 


Zz _ lp -u@/T 3 
Z,=— I, where 1, =TJe ar. (3.107) 
Acting absolutely similarly, for the third term of the grand canonical sum we may get 
Z 3 1 —U ' ww /T ‘ f 
Z;=T>Is, where 1, ak Cy Pre r", (3.108) 


where r’ and r” are the vectors characterizing the mutual positions of 3 particles — see Fig. 6b. 


These results may be extended by induction to an arbitrary N. Plugging the expression for Zy 
into the first of Eqs. (101) and recalling that Q =—PV, we get the equation of state of the gas in the form 


2 3 
paLuftez +21 +24] (3.109) 
V 2! 3! 

As a sanity check: at U = 0, all integrals Jy are equal to 1, and the expression under the logarithm in just 
the Taylor expansion of the function e”, giving P = TZ/V, and Q =—PV =—TZ. In this case, according to 
the last of Eqs. (1.62), the average number of particles of particles in the system is (N) = (0Q/0) ry = 
Z, because since Z x exp{/T}, 0Z/Ou = Z/T.48 Thus, in this limit, we have happily recovered the 
equation of state of the ideal gas. 


Returning to the general case of non-zero interactions, let us assume that the logarithm in Eq. 
(109) may be also represented as a direct Taylor expansion in Z: 


Cluster 


(5s 1 10) expansion: 


pressure 


48 Actually, the fact that in that case Z = (N) could have been noted earlier — just by comparing Eq. (104) with Eq. 
(32). 


Chapter 3 Page 28 of 34 


Essential Graduate Physics SM: Statistical Mechanics 


(The lower limit of the sum reflects the fact that according to Eq. (109), at Z = 0, P = (T/V) Inl = 0, so 
that the coefficient Jp in a more complete version of Eq. (110) would equal 0 anyway.) According to Eq, 
(1.60), this expansion corresponds to the grand potential 


Ospyre ry! (3.111) 
/=1 . 


Again using the last of Eqs. (1.62), and the definition (104) of the parameter Z, we get 


Cluster 
expansion: 


(N) 


(3.112) 


This equation may be used for finding Z for the given (N), and hence for the calculation of the 
equation of state from Eq. (110). The only remaining conceptual action item is to express the 
coefficients J; via the integrals Jy participating in the expansion (109). This may be done using the well- 
known Taylor expansion of the logarithm function, *? 


foe) 1 
Ind +2)= DCE. (3.113) 
/=1 
Using it together with Eq. (109), we get a Taylor series in Z, starting as 
2 3 
pat 24% G D+e I =D=3G5 D)- (3.114) 


Comparing this expression with Eq. (110), we see that 
J, =A, 
_ cd | U(r) /T 3 
J,=1,-1=T]e —1)d?r, 
J, =U, -))-3d, -) 


7 | oe UT VT _ UNIT _ -Ue"T a)ar'dr", . 


(3.115) 


“pe 
where r’” =r’—r"- see Fig. 6b. The expression of J2, describing the pair interactions of particles, is 
(besides a different numerical factor) equal to the second virial coefficient B(7) — see Eq. (93). As a 
reminder, the subtraction of | from the integral J, in the second of Eqs. (115) makes the contribution of 
each elementary 3D volume a’r into the integral J; different from zero only if at this r two particles 
interact (U # 0). Very similarly, in the last of Eqs. (115), the subtraction of three pair-interaction terms 
from (J; — 1) makes the contribution from an elementary 6D volume a’r’d’r” into the integral J; 
different from zero only if at that mutual location of particles, all three of them interact simultaneously, 
etc. 


49 Looking at Eq. (109), one may think that since € = Z + Z’J,/2 +... is of the order of at least Z ~ (N) >> 1, the 
expansion (113), which converges only if | €| < 1, is illegitimate. However, the expansion is justified by its result 
(114), in which the n™ term is of the order of (N)"(Vo/V)""'/n!, so that the series does converge if the gas density is 
sufficiently low: (N)/V << 1/Vo, i.€. rave >> 0. This is the very beauty of the cluster expansion, whose few first 
terms, rather unexpectedly, give good approximation even for a gas with (N) >> 1 particles. 
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In order to illustrate the cluster expansion method at work, let us eliminate the factor Z from the 
system of equations (110) and (112), with accuracy to terms O(Z’). For that, let us spell out these 
equations up to the terms O(Z’): 


ae yee -/33 tb osags (3.116) 
f 2 6 

J 
(N)=J,Z+J,Z° te es (3.117) 


and then divide these two expressions. We get the following result: 


PV _14U 2/24 ))Z +3 /6S)Z" $0 Jn 7s a 2 72 
(N)T 14+(J,/I,)Z+(J3/2I,)Z* +... ons (0 ee Oe 


(3.118) 


whose final form is accurate to terms O(Z’). In this approximation, we may again use Eq. (117), now 
solved for Z with the same accuracy: 


<2 iN)", (3.119) 


(3.120) 


The first of these relations, combined with the first two of Eqs. (115), yields for the 2" virial 
coefficient the same Eq. (96), B(T) = 4Vo, that was obtained from the Gibbs distribution. The second of 
these relations enables the calculation of the 3™ virial coefficient C(T). (Let me leave the calculation of 
J; and C(7), for the hardball model, for the reader’s exercise.) Evidently, a more complete solution of 
Eqs. (114), (116), and (117) may be used to calculate an arbitrary virial coefficient, though starting from 
the 5" coefficient, such calculations may be completed only numerically even in the simplest hardball 
model. 


3.6. Exercise problems 


3.1. Use the Maxwell distribution for an alternative (statistical) calculation of the mechanical 
work performed by the Szilard engine discussed in Sec. 2.3. 


Hint: You may assume the simplest geometry of the engine — see Fig. 2.4. 


3.2. Use the Maxwell distribution to calculate the drag 
coefficient n = —O(F)/du, where F is the force exerted by an ideal 
classical gas on a piston moving with a low velocity u, in the simplest u 
geometry shown in the figure on the right, assuming that collisions of 
gas particles with the piston are elastic. 


3.3. Derive the equation of state of the ideal classical gas from the grand canonical distribution. 


3.4. Prove that Eq. (22), 
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seas: +N qo 
2 V 9 


1 2 


AS = N, In 


derived for the change of entropy at mixing of two ideal classical gases of completely distinguishable 
particles (that initially had equal densities N/V and temperatures 7), is also valid if particles in each of 
the initial volumes are indistinguishable from each other but different from those in the counterpart 
volume. For simplicity, you may assume that masses and internal degeneracy factors of all particles are 
equal. 


3.5. A round cylinder of radius R and length Z, containing an ideal classical gas of N >> 1 
particles of mass m each, is rotated about its symmetry axis with angular velocity w. Assuming that the 
gas as the whole rotates with the cylinder, and is in thermal equilibrium at temperature 7, 


(i) calculate the gas pressure distribution along its radius, and analyze its temperature 
dependence, and 

(ii) neglecting the internal degrees of freedom of the particles, calculate the total energy of the 
gas and its heat capacity in the high- and low-temperature limits. 


3.6. N >> 1 classical, non-interacting, indistinguishable particles of mass m are confined in a 
parabolic, spherically-symmetric 3D potential well U(r) = «7/2. Use two different approaches to 
calculate all major thermodynamic characteristics of the system, in thermal equilibrium at temperature 
T, including its heat capacity. Which of the results should be changed if the particles are distinguishable, 
and how? 


Hint: Suggest a replacement of the notions of volume and pressure, appropriate for this system. 


3.7. In the simplest model of thermodynamic equilibrium between the liquid and gas phases of 
the same molecules, temperature and pressure do not affect the molecule's condensation energy A. 
Calculate the concentration and pressure of such saturated vapor, assuming that it behaves as an ideal 
gas of classical particles. 


3.8. An ideal classical gas of N >> 1 particles is confined in a container of volume V and wall 
surface area A. The particles may condense on container walls, releasing energy A per particle, and 
forming an ideal 2D gas. Calculate the equilibrium number of condensed particles and the gas pressure, 
and discuss their temperature dependences. 


3.9. The inner surfaces of the walls of a closed container of volume JV, filled with N >> 1 
particles, have Ns >> 1 similar traps (small potential wells). Each trap can hold only one particle, at 
potential energy —A < 0. Assuming that the gas of the particles in the volume is ideal and classical, 
derive an equation for the chemical potential “z of the system in equilibrium, and use it to calculate the 
potential and the gas pressure in the limits of small and large values of the N/Ns ratio. 


3.10. Calculate the magnetic response (the Pauli paramagnetism) of a degenerate ideal gas of 
spin-’/ particles to a weak external magnetic field, due to a partial spin alignment with the field. 
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3.11. Calculate the magnetic response (the Landau diamagnetism) of a degenerate ideal gas of 
electrically charged fermions to a weak external magnetic field, due to their orbital motion. 


3.12." Explore the Thomas-Fermi model of a heavy atom, with nuclear charge Q = Ze >> e, in 
which the electrons are treated as a degenerate Fermi gas, interacting with each other only via their 
contribution to the common electrostatic potential @(r). In particular, derive the ordinary differential 
equation obeyed by the radial distribution of the potential, and use it to estimate the effective radius of 
the atom.°° 


3.13." Use the Thomas-Fermi model, explored in the previous problem, to calculate the total 


binding energy of a heavy atom. Compare the result with that for a simpler model, in that the Coulomb 
electron-electron interaction of electrons is completely ignored. 


3.14. Calculate the characteristic Thomas-Fermi length Aryp of weak electric field’s screening by 
conduction electrons in a metal, modeling their ensemble as an ideal, degenerate, isotropic Fermi gas. 

Hint: Assume that Arr is much larger than the Bohr radius rg. 

3.15. For a degenerate ideal 3D Fermi gas of N particles, confined in a rigid-wall box of volume 


V, calculate the temperature dependencies of its pressure P and the heat capacity difference (Cp — Cy), in 
the leading approximation in 7 << g;. Compare the results with those for the ideal classical gas. 


Hint: You may like to use the solution of Problem 1.9. 


3.16. How would the Fermi statistics of an ideal gas affect the barometric formula (28)? 


1oS) 


Fermi gas of N >> 1 non-interacting, indistinguishable, ultra-relativistic particles.5! Calculate E, and 
also the gas pressure P explicitly in the degenerate gas limit T — 0. In particular, is Eq. (48) valid in this 
case? 


.17. Derive general expressions for the energy E and the chemical potential w of a uniform 


3.18. Use Eq. (49) to calculate the pressure of an ideal gas of ultra-relativistic, indistinguishable 
quantum particles, for an arbitrary temperature, as a function of the total energy EF of the gas, and its 
volume V. Compare the result with the corresponding relations for the electromagnetic blackbody 
radiation and for an ideal gas of non-relativistic particles. 


3.19.” Calculate the speed of sound in an ideal gas of ultra-relativistic fermions of density 7 at 
negligible temperature. 


50 Since this problem, and the next one, are important for atomic physics, and at their solution, thermal effects 
may be ignored, they were given in Chapter 8 of the QM part of the series as well, for the benefit of readers who 
would not take this SM part. Note, however, that the argumentation in their solutions may be streamlined by using 
the notion of the chemical potential 4, which was introduced only in this course. 

5! This is, for example, an approximate but reasonable model for electrons in white dwarf stars, whose Coulomb 
interaction is mostly compensated by the charge of nuclei of fully ionized helium atoms. 
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3.20. Calculate basic thermodynamic characteristics, including all relevant thermodynamic 
potentials, specific heat, and the surface tension of a uniform, non-relativistic 2D electron gas with given 
areal density n = N/A: 


(i) at T=0, and 
(11) at low temperatures (in the lowest order in 7/ép << 1, giving a nonzero result), 


neglecting the Coulomb interaction effects.>2 


3.21. Calculate the effective latent heat Acs = —N(OQ/ONo)n,v of evaporation of the spatially- 
uniform Bose-Einstein condensate as a function of temperature 7. Here Q is the heat absorbed by the 
(condensate + gas) system of N >> 1 particles as a whole, while No is the number of particles in the 
condensate alone. 


3.22." For an ideal, spatially-uniform Bose gas, calculate the law of the chemical potential’s 
disappearance at 7 — T,, and use the result to prove that the heat capacity Cy is a continuous function of 
temperature at the critical point T= Ty. 


3.23. In Chapter 1 of these notes, several thermodynamic relations involving entropy have been 
discussed, including the first of Eqs. (1.39): 
S =-(6G/0T),. 


If we combine this expression with Eq. (1.56), G = uN, it looks like that for the Bose-Einstein 
condensate, whose chemical potential 4 equals zero at temperatures below the critical point 7,, the 
entropy should vanish as well. On the other hand, dividing both parts of Eq. (1.19) by d7, and assuming 
that at this temperature change the volume is kept constant, we get 


C, =T(0S/0T),. 


(This equality was also mentioned in Chapter 1.) If the Cy is known as a function of temperature, the last 
relation may be integrated over 7 to calculate S: 
CAL 
S= | ony, ) T + const. 
V =const 

According to Eq. (80), the specific heat for the Bose-Einstein condensate is proportional to T *”, so that 
the integration gives a non-zero entropy S « T 3? Resolve this apparent contradiction, and calculate the 
genuine entropy at T= To. 


3.24. The standard analysis of the Bose-Einstein condensation, outlined in Sec. 4, may seem to 
ignore the energy quantization of the particles confined in volume V. Use the particular case of a cubic 
confining volume V = axaxa with rigid walls to analyze whether the main conclusions of the standard 
theory, in particular Eq. (71) for the critical temperature of the system of N >> 1 particles, are affected 
by such quantization. 


52 This condition may be approached reasonably well, for example, in 2D electron gases formed in semiconductor 
heterostructures (see, e.g., the discussion in QM Sec. 1.6, and the solution of Problem 3.2 of that course), due to 
the electron field’s compensation by background ionized atoms, and its screening by highly doped semiconductor 
bulk. 
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3.25.. N>> 1 non-interacting bosons are confined in a soft, spherically-symmetric potential well 
U(r) = m@r’/2. Develop the theory of the Bose-Einstein condensation in this system; in particular, 
prove Eq. (74b), and calculate the critical temperature T,. Looking at the solution, what is the most 
straightforward way to detect the condensation in experiment? 


3.26. Calculate the chemical potential of an ideal, uniform 2D gas of spin-0 Bose particles as a 
function of its areal density n (the number of particles per unit area), and find out whether such gas can 
condense at low temperatures. Review your result for the case of a large (NV >> 1) but finite number of 
particles. 


3.27. Can the Bose-Einstein condensation be achieved in a 2D system of N >> 1 non-interacting 
bosons placed into a soft, axially-symmetric potential well, whose potential may be approximated as 
U(r) = ma’ p’/2, where p =x’ + y’, and {x, y} are the Cartesian coordinates in the particle confinement 
plane? If yes, calculate the critical temperature of the condensation. 


3.28. Use Eqs. (115) and (120) to calculate the third virial coefficient C(7) for the hardball 
model of particle interactions. 


3.29. Assuming the hardball model, with volume Vo per molecule, for the liquid phase, describe 
how the results of Problem 3.7 change if the liquid forms spherical drops of radius R >> Vo'”. Briefly 
discuss the implications of the result for water cloud formation. 


Hint: Surface effects in macroscopic volumes of liquids may be well described by attributing an 
additional energy y(equal to the surface tension) to the unit surface area.>? 


53 See, e.g., CM Sec. 8.2. 
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Chapter 4. Phase Transitions 


This chapter gives a rather brief discussion of coexistence between different states (“phases”) of 
collections of many similar particles, and transitions between these phases. Due to the complexity of 
these phenomena, which involve particle interactions, quantitative analytical results in this field have 
been obtained only for a few very simple models, typically giving only a very approximate description 
of real systems. 


4.1. First-order phase transitions 


From our everyday experience, say with water ice, liquid water, and water vapor, we know that 
one chemical substance (i.e. a set of many similar particles) may exist in different stable states — phases. 
A typical substance may have: 


(i) a dense solid phase, in which interatomic forces keep all atoms/molecules in virtually fixed 
relative positions, with just small thermal fluctuations about them; 


(ii) a liquid phase, of comparable density, in which the relative distances between atoms or 
molecules are almost constant, but the particles are virtually free to move around each other, and 


(111) a gas phase, typically of a much lower density, in which the molecules are virtually free to 
move all around the containing volume. ! 


Experience also tells us that at certain conditions, two different phases may be in thermal and 
chemical equilibrium — say, ice floating on water with the freezing-point temperature. Actually, in Sec. 
3.4 we already discussed a qualitative theory of one such equilibrium: the Bose-Einstein condensate’s 
coexistence with the uncondensed “vapor” of similar particles. However, this is a rather exceptional case 
when the phase coexistence is due to the quantum nature of the particles (bosons) that may not interact 
directly. Much more frequently, the formation of different phases, and transitions between them, are due 
to particle repulsive and attractive interactions, briefly discussed in Sec. 3.5. 


Phase transitions are sometimes classified by their order.? I will start their discussion with the 
so-called first-order phase transitions that feature non-zero latent heat A — the amount of heat that is 
necessary to turn one phase into another phase completely, even if temperature and pressure are kept 
constant.3 Unfortunately, even the simplest “microscopic” models of particle interaction, such as those 
discussed in Sec. 3.5, give rather complex equations of state. (As a reminder, even the simplest hardball 
model leads to the series (3.100), whose higher virial coefficients defy analytical calculation.) This is 


' The plasma phase, in which atoms are partly or completely ionized, is frequently mentioned on one more phase, 
on equal footing with the three phases listed above, but one has to remember that in contrast to them, a typical 
electroneutral plasma consists of particles of two very different sorts — positive ions and electrons. 

2 Such classification schemes, started by Paul Ehrenfest in the early 1930s, have been repeatedly modified to 
accommodate new results for particular systems, and by now only the “first-order phase transition” is still a 
generally accepted term, but with a definition different from the original one. 

3 For example, for water the latent heat of vaporization at the ambient pressure is as high as ~2.2x10° J/kg, i.e. ~ 
0.4 eV per molecule, making this ubiquitous liquid indispensable for effective fire fighting. (The latent heat of 
water ice’s melting is an order of magnitude lower.) 
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why I will follow the tradition to discuss the first-order phase transitions using a simple 
phenomenological model suggested in 1873 by Johannes Diderik van der Waals. 


For its introduction, it is useful to recall that in Sec. 3.5 we have derived Eq. (3.99) — the 
equation of state for a classical gas of weakly interacting particles, which takes into account (albeit 
approximately) both interaction components necessary for a realistic description of gas 
condensation/liquefaction: the long-range attraction of the particles and their short-range repulsion. Let 


us rewrite that result as follows: 
2 
P+a, - 21+] (4.1) 
V V V 


As we saw at the derivation of this formula, the physical meaning of the constant b is the effective 
volume of space taken by a particle pair collision — see Eq. (3.96). The relation (1) is quantitatively valid 

only if the second term in the parentheses is small, Nb << J, 1.e. if the total volume excluded from 
particles’ free motion because of their collisions is much smaller than the whole volume V. In order to 
describe the condensed phase (which I will call “liquid” +), we need to generalize this relation to the case 

Nb ~ V. Since the effective volume left for particles’ motion is V — Nb, it is very natural to make the 
following replacement: V + V — Nb, in the equation of state of the ideal gas. If we also keep on the left- 

hand side the term aN’/V’, which describes the long-range attraction of particles, we get the van der 

Waals equation of state: 

Van der 


(4.2) Waals 


equation 


One advantage of this simple model is that in the rare gas limit, Nb << V, it reduces back to the 
microscopically-justified Eq. (1). (To verify this, it is sufficient to Taylor-expand the right-hand side of 
Eq. (2) in small Nb/V << 1, and retain only two leading terms.) Let us explore the basic properties of this 
model. 


It is frequently convenient to discuss any equation of state in terms of its isotherms, i.e. the P(V) 
curves plotted at constant 7. As Eq. (2) shows, in the van der Waals model such a plot depends on four 
parameters: a, b, N, and 7, complicating general analysis of the model. To simplify the task, it is 
convenient to introduce dimensionless variables: pressure p = P/P,, volume v = V/V,, and temperature t 
= T/T,, normalized to their so-called critical values, 


ps. Ya, Da". (4.3) 
27 b 27 b 
whose meaning will be clear in a minute. In this notation, Eq. (2) acquires the following form, 
3 8t 
+—= , 4.4 
is vy 3v-1 C4) 


so that the normalized isotherms p(v) depend on only one parameter, the normalized temperature t — see 
Fig. 1. 


4 Due to the phenomenological character of the van der Waals model, one cannot say for sure whether the 
condensed phase it predicts corresponds to a liquid or a solid. However, in most real substances at ambient 
conditions, gas coexists with liquid, hence the name. 
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Fig. 4.1. The van der Waals equation of 
state, plotted on the [p, v] plane for several 
values of the reduced temperature ¢ = T /T,. 
Shading shows the single-phase instability 
range in that (OP/OV)r> 0. 


0 1 2 


veV/Y, 


The most important property of these plots is that the isotherms have qualitatively different 
shapes in two temperature regions. At ¢ > 1, 1.e. T > T,, pressure increases monotonically at gas 
compression (qualitatively, as in an ideal classical gas, with P = NT/V, to which the van der Waals 
system tends at T >> T,), i.e. with (OP/OV)7 < 0 at all points of the isotherm.5 However, below the 
critical temperature 7,, any isotherm features a segment with (OP/OV)r7 >0. It is easy to understand that, 
as least in a constant-pressure experiment (see, for example, Fig. 1.5), these segments describe a 
mechanically unstable equilibrium. Indeed, if due to a random fluctuation, the volume deviated upward 
from the equilibrium value, the pressure would also increase, forcing the environment (say, the heavy 
piston in Fig. 1.5) to allow further expansion of the system, leading to even higher pressure, etc. A 
similar deviation of volume downward would lead to a similar avalanche-like decrease of the volume. 
Such avalanche instability would develop further and further until the system has reached one of the 
stable branches with a negative slope (OP/OV)r. In the range where the single-phase equilibrium state is 
unstable, the system as a whole may be stable only if it consists of the two phases (one with a smaller, 
and another with a higher density n = N/V) that are described by the two stable branches — see Fig. 2. 


P 
stable liquid 
liquid and gas 
in equilibrium 
PT) e 
unstable stable gaseous 
branch 


phase 


Fig. 4.2. Phase equilibrium 
at T< T, (schematically). 


V 


5 The special choice of numerical coefficients in Eq. (3) makes the border between these two regions to take place 
exactly at t= 1, i.e. at the temperature equal to 7., with the critical point’s coordinates equal to P, and V.. 

6 Actually, this assumption is not crucial for our analysis of mechanical stability, because if a fluctuation takes 
place in a small part of the total volume V, its other parts play the role of pressure-fixing environment. 
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In order to understand the basic properties of this two-phase system, let us recall the general 
conditions of the thermodynamic equilibrium of two systems, which have been discussed in Chapter 1: 


T, =T, (thermal equilibrium), (4.5) 


HM, = Lt, “chemical” equilibrium), (4.6) 


the latter condition meaning that the average energy of a single (“probe’’) particle in both systems has to 
be the same. To those, we should add the evident condition of mechanical equilibrium, 


P. = P, (mechanical equilibrium), (4.7) 


which immediately follows from the balance of normal forces exerted on an inter-phase boundary. 


If we discuss isotherms, Eq. (5) is fulfilled automatically, while Eq. (7) means that the effective 
isotherm P(V) describing a two-phase system should be a horizontal line — see Fig. 2: 


P=P(T). (4.8) 


Along this line,’ internal properties of each phase do not change; only the particle distribution is: it 
evolves gradually from all particles being in the liquid phase at point | to all particles being in the gas 
phase at point 2.8 In particular, according to Eq. (6), the chemical potentials “z of the phases should be 
equal at each point of the horizontal line (8). This fact enables us to find the line’s position: it has to 
connect points 1 and 2 in that the chemical potentials of the two phases are equal to each other. Let us 


recast this condition as 
2. 


(du =0, ie [d¢=0, (4.9) 
1 


1 


where the integral may be taken along the single-phase isotherm. (For this mathematical calculation, the 
mechanical instability of states on some part of this curve is not important.) By its construction, along 
that curve, VN = const and T= const, so that according to Eq. (1.53c), dG = —SdT + VdP +uidN, for a slow 
(reversible) change, dG = VdP. Hence Eq. (9) yields 


2 
[vap =o. (4.10) 
1 

This equality means that in Fig. 2, the shaded areas Ayand A, should be equal. ° 


7 Frequently, Po(7) is called the saturated vapor pressure. 

8 A natural question: is the two-phase state with P = P)(T) the only state existing between points 1 and 2? Indeed, 
the branches 1-1’ and 2-2’ of the single-phase isotherm also have negative derivatives (OP/6V); and hence are 
mechanically stable with respect to small perturbations. However, these branches are actually metastable, i.e. 
have larger Gibbs energy per particle (i.e. yz) than the counterpart phase and are hence unstable to larger 
perturbations — such as foreign microparticles (say, dust), protrusions on the confining walls, etc. In very 
controlled conditions, these single-phase “superheated” and “supercooled” states can survive almost all the way to 
the zero-derivative points 1’ and 2’, leading to sudden jumps of the system into the counterpart phase. (At fixed 
pressure, such jumps go as shown by dashed lines in Fig. 2.) In particular, at the atmospheric pressure, purified 
water may be supercooled to almost —50°C, and superheated to nearly +270°C. However, at more realistic 
conditions, perturbations result in the two-phase coexistence formation close to points | and 2. 

9 This Maxwell equal-area rule (also called “Maxwell’s construct”) was suggested by J. C. Maxwell in 1875 
using more complex reasoning. 
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As the same Fig. 2 figure shows, the Maxwell rule may be rewritten in a different form, 


[[P-Pav =o. (4.11) 


which is more convenient for analytical calculations than Eq. (10) if the equation of state may be 
explicitly solved for P — as it is in the van der Waals model (2). Such calculation (left for the reader’s 
exercise) shows that for that model, the temperature dependence of the saturated vapor pressure at low T 
is exponential, !° 


a_27 


PT) oer. exp) - +. with A= , ae for T <<T., (4.12) 


corresponding very well to the physical picture of particle’s thermal activation from a potential well of 
depth A. 


The signature parameter of a first-order phase transition, the latent heat of evaporation 


may also be found by a similar integration along the single-phase isotherm. Indeed, using Eq. (1.19), dO 
= TdS, we get 


A=|TdS =T(S, —§,). (4.14) 


Let us express the right-hand side of Eq. (14) via the equation of state. For that, let us take the full 
derivative of both sides of Eq. (6) over temperature, considering the value of G = Nw for each phase as a 
function of P and 7, and taking into account that according to Eq. (7), P1 = P2 = Po(T): 


() ea dP, -(=) (=) dF, (4.15) 
OT )> OP ), dT OT ) > OP }), aT 
According to the first of Eqs. (1.39), the partial derivative (OG/OT)p is just minus the entropy, while 


according to the second of those equalities, (OG/OP); is the volume. Thus Eq. (15) becomes 


EP 
ee ae 
aT 


8, Va. (4.16) 


Solving this equation for (S: — 5S), and plugging the result into Eq. (14), we get the following 


Clapeyron-Clausius formula: 
dP. 
A=T(V, -V,)—. 4.17 
(VV, —V;) ar (4.17) 


For the van der Waals model, this formula may be readily used for the analytical calculation of A in two 
limits: T << T, and (T, — T) << T, — the exercises left for the reader. In the latter limit, A oc (T.— T)'” 
naturally vanishing at the critical temperature. 


> 


10 Tt is fascinating how well is this Arrhenius exponent hidden in the polynomial van der Waals equation (2)! 
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Finally, some important properties of the van der Waals’ model may be revealed more easily by 
looking at the set of its isochores P = P(7) for V = const, rather than at the isotherms. Indeed, as Eq. (2) 
shows, all single-phase isochores are straight lines. However, if we interrupt these lines at the points 
when the single phase becomes metastable, and complement them with the (very nonlinear!) 
dependence Po(7), we get the pattern (called the phase diagram) shown schematically in Fig. 3a. 


(b) 
P critical 
points 
ete Leelee 
v I, i 


Fig. 4.3. (a) Van der Waals model’s isochores, the saturated gas pressure diagram, and the 
critical point, and (b) the phase diagram of a typical three-phase system (all schematically). 


At this plot, one more meaning of the critical point {P., 7.} becomes very vivid. At fixed 
pressure P < P,, the liquid and gaseous phases are clearly separated by the saturated pressure line Po(7), 
so if we achieve the transition between the phases just by changing temperature (see the red horizontal 
line in Fig. 3a), we have to pass through the phase equilibrium point, being delayed there to either give 
to the system the latent heat or take it out. However, if we perform the transition between the same 
initial and final points by changing both the pressure and temperature, going around the critical point 
(see the blue line in Fig. 3a), no definite point of transition may be observed: the substance stays in a 
single phase, and it is a subjective judgment of the observer in which region that phase should be called 
the liquid, and in which region — the gas. For water, the critical point corresponds to the temperature of 
647 K (374°C), and P, = 22.1 MPa (i.e. ~200 bars), so that a lecture demonstration of its critical 
behavior would require substantial safety precautions. This is why such demonstrations are typically 
carried out with other substances such as either diethyl ether,!! with its much lower 7, (194°C) and P, 
(3.6 MPa), or the now-infamous carbon dioxide CO, with even lower T, (31.1°C), though higher P, (7.4 
MPa). Though these substances are colorless and clear in both gas and liquid phases, their separation 
(by gravity) is still visible, due to small differences in the optical refraction coefficient, at P< P., but not 
above P..!? 


Thus, in the van der Waals model, two phases may coexist, though only at certain conditions — in 
particular, 7 < T,. Now a natural, more general question is whether the coexistence of more than two 


'1 (CH3-CH2)-O-(CH2-CHs), historically the first popular general anesthetic. 

12 Tt is interesting that very close to the critical point the substance suddenly becomes opaque — in the case of 
ether, whitish. The qualitative explanation of this effect, called the critical opalescence, is simple: at this point, 
the difference of the Gibbs energies per particle (i.e. the chemical potentials) of the two phases becomes so small 
that unavoidable thermal fluctuations lead to spontaneous appearance and disappearance of relatively large (a- 
few-pm-scale) single-phase regions in all the volume. A large concentration of boundaries of such randomly- 
shaped regions leads to strong light scattering. 
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phases of the same substance is possible. For example, can the water ice, the liquid water, and the water 
vapor (steam) all be in thermodynamic equilibrium? The answer is essentially given by Eq. (6). From 
thermodynamics, we know that for a uniform system (i.e. a single phase), pressure and temperature 
completely define the chemical potential 4 P, T). Hence, dealing with two phases, we had to satisfy just 
one chemical equilibrium condition (6) for two common arguments P and 7. Evidently, this leaves us 
with one extra degree of freedom, so that the two-phase equilibrium is possible within a certain range of 
P at fixed T (or vice versa) — see again the horizontal line in Fig. 2 and the bold line in Fig. 3a. Now, if 
we want three phases to be in equilibrium, we need to satisfy two equations for these variables: 


u,(P,T) = 1,(P,T) = H,(P,T). (4.18) 


Typically, the functions 4(P, T) are monotonic, so that the two equations (18) have just one solution, the 
so-called triple point {P;, T;}. Of course, the triple point {P;, 7;} of equilibrium between three phases 
should not be confused with the critical points {P., 7,} of transitions between each of two-phase pairs. 
Fig. 3b shows, very schematically, their relation for a typical three-phase system solid-liquid-gas. For 
example, water, ice, and water vapor are at equilibrium at a triple point corresponding to P; ~ 0.612 
kPa!3 and 7; = 273.16 K. The practical importance of this particular temperature point is that by an 
international agreement it has been accepted for the definition of not only the Kelvin temperature scale, 
but also of the Celsius scale’s reference, as 0.01°C, so that the absolute temperature zero corresponds to 
exactly —273.15°C.!4 More generally, triple points of purified simple substances (such as Ho, N>, Oo, Ar, 
Hg, and H2O) are broadly used for thermometer calibration, defining the so-called international 
temperature scales including the currently accepted scale ITS-90. 


This analysis may be readily generalized to multi-component systems consisting of particles of 
several (say, L) sorts.!5 If such a mixed system is in a single phase, i.e. is macroscopically uniform, its 
chemical potential may be defined by a natural generalization of Eq. (1.53c): 


L 
dG =-SdT +VdP+ > dn (4.19) 

[=] 
The last term reflects the fact that usually, each single phase is not a pure chemical substance, but has 
certain concentrations of all other components, so that 4” may depend not only on P and T but also on 
the concentrations c = NIN of particles of each sort. If the total number N of particles is fixed, the 
number of independent concentrations is (Z — 1). For the chemical equilibrium of R phases, all R values 
of u,.) (r= 1, 2, ..., R) have to be equal for particles of each sort: 44° = to = ... = up, with each py” 
depending on (Z — 1) concentrations c.), and also on P and T. This requirement gives L(R — 1) equations 
for (L —1)R concentrations ce, plus two common arguments P and 7, 1.e. for [(Z —1)R + 2] independent 
variables. This means that the number of phases has to satisfy the limitation 


(4.20) 


13 Please note that for water, P, is much lower than the normal atmospheric pressure (101.325 kPa). 

14 Note the recent (2018) re-definition of the “legal” kelvin via joule (see, appendix CA: Selected Physical 
Constants); however, the new definition is compatible, within experimental accuracy, with that mentioned above. 
!5 Perhaps the most practically important example is the air/water system. For its detailed discussion, based on Eq. 
(19), the reader may be referred, e.g., to Sec. 3.9 in F. Schwabl, Statistical Mechanics, Springer (2000). Other 
important applications include liquid solutions, and metallic alloys — solid solutions of metal elements. 
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where the equality sign may be reached in just one point in the whole parameter space. This is the Gibbs 
phase rule. As a sanity check, for a single-component system, L = 1, the rule yields R < 3 — exactly the 
result we have already discussed. 


4.2. Continuous phase transitions 


As Fig. 2 illustrates, if we fix pressure P in a system with a first-order phase transition, and start 
changing its temperature, then the complete crossing of the transition-point line, defined by the equation 
P(T) = P, requires the insertion (or extraction) some non-zero latent heat A. Eqs. (14) and (17) show 
that A is directly related to non-zero differences between the entropies and volumes of the two phases 
(at the same pressure). As we know from Chapter 1, both S and V may be represented as the first 
derivatives of appropriate thermodynamic potentials. This is why P. Ehrenfest called such transitions, 
involving jumps of potentials’ first derivatives, the first-order phase transitions. 


On the other hand, there are phase transitions that have no first derivative jumps at the transition 
temperature 7,, so that the temperature point may be clearly marked, for example, by a jump of the 
second derivative of a thermodynamic potential — for example, the derivative 0C/OT which, according to 
Eq. (1.24), equals to @’E/OT”. In the initial Ehrenfest classification, this was an example of a second- 
order phase transition. However, most features of such phase transitions are also pertinent to some 
systems in which the second derivatives of potentials are continuous as well. Due to this reason, I will 
use a more recent terminology (suggested in 1967 by M. Fisher), in which all phase transitions with A = 
0 are called continuous. 


Most (though not all) continuous phase transitions result from particle interactions. Here are 
some representative examples: 


(i) At temperatures above ~490 K, the crystal lattice of barium titanate (BaTiO3) is cubic, with a 
Ba ion in the center of each Ti-cornered cube (or vice versa) — see Fig. 4a. However, as the temperature 
is being lowered below that critical value, the sublattice of Ba ions starts moving along one of six sides 
of the TiO; sublattice, leading to a small deformation of both lattices — which become tetragonal. This is 
a typical example of a structural transition, in this particular case combined with a ferroelectric 
transition, because (due to the positive electric charge of the Ba ions) below the critical temperature the 
BaTiO; crystal acquires a spontaneous electric polarization even in the absence of external electric field. 


(a) (b) 
Ooms | aaa 

as ae a eee 
Pearce Te 

i i i 1 Fig. 4.4. Single cells of 
|  @----«'---@ | @----- pao crystal lattices of (a) 
iw # ea ws BaTiO; and (b) CuZn. 
©@----6----6 @--------- 5°) 


(11) A different kind of phase transition happens, for example, in Cu,Znj., alloys — so-called 
brasses. Their crystal lattice is always cubic, but above certain critical temperature 7, (which depends 
on x) any of its nodes may be occupied by either a copper or a zinc atom, at random. At 7 < 7T,, a trend 
toward ordered atom alternation arises, and at low temperatures, the atoms are fully ordered, as shown 
in Fig. 4b for the stoichiometric case x = 0.5. This is a good example of an order-disorder transition. 
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(iii) At ferromagnetic transitions (such as the one taking place, for example, in Fe at 1,388 K) 
and antiferromagnetic transitions (e.g., in MnO at 116 K), lowering of temperature below the critical 
value!® does not change atom positions substantially, but results in a partial ordering of atomic spins, 
eventually leading to their full ordering (Fig. 5). 


(a) (b) 


Fig. 4.5. Classical images 
of fully ordered phases: (a) 
a ferromagnet, and (b) an 
antiferromagnet. 


Note that, as it follows from Eqs. (1.1)-(1.3), at ferroelectric transitions the role of pressure is 
played by the external electric field &, and at the ferromagnetic transitions, by the external magnetic 
field # As we will see very soon, even in systems with continuous phase transitions, a gradual change 
of such an external field, at a fixed temperature, may induce jumps between metastable states, similar to 
those in systems with first-order phase transitions (see, e.g., the dashed arrows in Fig. 2), with non-zero 
decreases of the appropriate free energy. 


Besides these standard examples, some other threshold phenomena, such as the formation of a 
coherent optical field in a laser, and even the self-excitation of oscillators with negative damping (see, 
e.g., CM Sec. 5.4), may be treated, at certain conditions, as continuous phase transitions.!7 


The general feature of all these transitions is the gradual formation, at T< 7,, of certain ordering, 
which may be characterized by some order parameter n # 0. The simplest example of such an order 
parameter is the magnetization at the ferromagnetic transitions, and this is why the continuous phase 
transitions are usually discussed on certain models of ferromagnetism. (I will follow this tradition, while 
mentioning in passing other important cases that require a substantial modification of the theory.) Most 
of such models are defined on an infinite 3D cubic lattice (see, e.g., Fig. 5), with evident generalizations 
to lower dimensions. For example, the Heisenberg model of a ferromagnet (suggested in 1928) is 
defined by the following Hamiltonian: 


(4.21) 


where 6, is the Pauli vector operator!’ acting on the k spin, and h is the normalized external magnetic 
field: 


16 For ferromagnets, this point is usually referred to at the Curie temperature, and for antiferromagnets, as the 
Néel temperature. 

'7 Unfortunately, I will have no time/space for these interesting (and practically important) generalizations, and 
have to refer the interested reader to the famous monograph by R. Stratonovich, Topics in the Theory of Random 
Noise, in 2 vols., Gordon and Breach, 1963 and 1967, and/or the influential review by H. Haken, 
Ferstkérperprobleme 10, 351 (1970). 


Chapter 4 Page 9 of 36 


Essential Graduate Physics SM: Statistical Mechanics 


h= My Lg He : (4.22) 


(Here + is the magnitude of the spin’s magnetic moment; for the Heisenberg model to be realistic, it 


should be of the order of the Bohr magneton pp = eh/2m, = 0.927x10° J/T.) The figure brackets {j, 7} 
in Eq. (21) denote the summation over the pairs of adjacent lattice sites, so that the magnitude of the 
constant J may be interpreted as the maximum coupling energy per “bond” between two adjacent 
particles. At J > 0, the coupling tries to keep spins aligned, i.e. to install the ferromagnetic ordering.!° 
The second term in Eq. (21) describes the effect of the external magnetic field, which tries to orient all 
spin magnetic moments along its direction.2° 


However, even the Heisenberg model, while being rather approximate (in particular because its 
standard form (21) is only valid for spins-’4), is still rather complex for analysis. This is why most 
theoretical results have been obtained for its classical twin, the [sing model:?! 


(4.23) 


Here E,, are the particular values of the system’s energy in each of its 2” possible states with all possible 
combinations of the binary classical variables s; = +1, while A is the normalized external magnetic 
field’s magnitude — see Eq. (22). (Despite its classical character, the variable s,, modeling the field- 
oriented Cartesian component of the real spin, is usually called “spin” for brevity, and I will follow this 
tradition.) Somewhat shockingly, even for this toy model, no exact analytical 3D solution that would be 
valid at arbitrary temperature has been found yet, and the solution of its 2D version by L. Onsager in 
1944 (see Sec. 5 below) is still considered one of the top intellectual achievements of statistical physics. 
Still, Eq. (23) is very useful for the introduction of basic notions of continuous phase transitions, and 
methods of their analysis, so that for my brief discussion I will mostly use this model.” 


Evidently, if 7’ = 0 and h = 0, the lowest possible energy, 


E —JINd , (4.24) 


min ~ 


where d is the lattice dimensionality, is achieved in the “ferromagnetic” phase in which all spins s; are 
equal to either +1 or —1, so that (s;) =+1 as well. On the other hand, at J = 0, the spins are independent, 
and if h = 0 as well, all s, are completely random, with the 50% probability to take either of values +1, 
so that (s;) = 0. Hence in the general case (with arbitrary J and h), we may use the average 


n=(s,) (4.25) 


18 See, e.g., QM Sec. 4.4. 

19 At J< 0, the first term of Eq. (21) gives a reasonable model of an antiferromagnet, but in this case, the external 
magnetic field effects are more subtle; I will not have time to discuss them. 

20 See, e.g., QM Eq. (4.163). 

21 Named after Ernst Ising who explored the 1D version of the model in detail in 1925, though a similar model 
was discussed earlier (in 1920) by Wilhelm Lenz. 

22 For more detailed discussions of phase transition theories (including other popular models of the ferromagnetic 
phase transition, e.g., the Potts model), see, e.g., either H. Stanley, Introduction to Phase Transitions and Critical 
Phenomena, Oxford U. Press, 1971; or A. Patashinskii and V. Pokrovskii, Fluctuation Theory of Phase 
Transitions, Pergamon, 1979; or B. McCoy, Advanced Statistical Mechanics, Oxford U. Press, 2010. For a very 
concise text, I can recommend J. Yeomans, Statistical Mechanics of Phase Transitions, Clarendon, 1992. 
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as a good measure of spin ordering, i.e. as the order parameter. Since in a real ferromagnet, each spin 
carries a magnetic moment, the order parameter 77 is proportional to the Cartesian component of the 
system’s magnetization, in the direction of the applied magnetic field. 


Now that the Ising model gave us a very clear illustration of the order parameter, let me use this 
notion for quantitative characterization of continuous phase transitions. Due to the difficulty of 
theoretical analyses of most models of the transitions at arbitrary temperatures, their theoretical 
discussions are focused mostly on a close vicinity of the critical point 7,. Both experiment and theory 
show that in the absence of an external field, the function 7(7) is close to a certain power, 


no FP for-7 > 0, 1.6.7 <T,; (4.26) 
of the small deviation from the critical temperature — which is conveniently normalized as 
T,-T 
= 
ik 


Cc 


(4.27) 


Remarkably, most other key variables follow a similar temperature behavior, with critical exponents 
being the same for both signs of z. In particular, the heat capacity at a fixed magnetic field behaves as”3 


ecee|e| (4.28) 
Similarly, the (normalized) low-field susceptibility** 


on "A 
SS ey ee) (4.29) 
x oh iF 0 | | 
Two other important critical exponents, ¢ and v, describe the temperature behavior of the 
correlation function (s,8,), Whose dependence on the distance 7; between two spins may be well fitted 


by the following law, 


1 gen 
ire ice hag | “e} (4.30) 


with the correlation radius 7 
Ae (4.31) 


Finally, three more critical exponents, usually denoted ¢, 6, and yw, describe the external field 
dependences of, respectively, c, 7, andr, at tr > 0. For example, 6 is defined as 


no hve | (4.32) 


(Other field exponents are used less frequently, and for their discussion, the interested reader is referred 
to the special literature that was cited above.) 


The leftmost column of Table 1 shows the ranges of experimental values of the critical 
exponents for various 3D physical systems featuring continuous phase transitions. One can see that their 
values vary from system to system, leaving no hope for a universal theory that would describe them all 


23 The forms of this and other functions of 7 are selected to make all critical exponents non-negative. 
24 In most models of ferromagnetic phase transitions, this variable is proportional to the genuine low-field 
magnetic susceptibility 7 of the material — see, e.g., EM Eq. (5.111). 
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exactly. However, certain combinations of the exponents are much more reproducible — see the four 
bottom lines of the table. 


Table 4.1. Major critical exponents of continuous phase transitions 


Exponents and | Experimental Landau’s 2D Ising 3D Ising 3D Heisenberg 
combinations range (3D) theory model model Model 
a 0-0.14 Om) ° 0.12 0.14 
B 0.32 — 0.39 1/2 1/8 0.31 0.3 
y Lo=14 1 7/4 25 1.4 
O 4-5 3 15 5 u 
Vv 0.6 —0.7 1/2 1 0.64 0.7 
C 0.05 0 1/4 0.05 0.04 
(at+2B+ p/2 1.00 + 0.005 1 1 1 1 
o- Wp 0.93 + 0.08 1 1 1 ? 
(2-— Quy 1.02 + 0.05 1 1 1 1 
(2 — a)/vd 2 A/d 1 1 1 


(@) Experimental data are from the monograph by A. Patashinskii and V. Pokrovskii, cited above. 
) Discontinuity at 7= 0 — see below. 

Instead of following Eq. (28), in this case c, diverges as In|r]. 

With the order parameter 7 defined as (0; B)/B. 


Historically the first (and perhaps the most fundamental) of these universal relations was derived 
in 1963 by J. Essam and M. Fisher: 


a+2Bty=2. (4.33) 


It may be proved, for example, by finding the temperature dependence of the magnetic field value, h,, 
that changes the order parameter by the same amount as a finite temperature deviation 7 > 0 gives at h= 
0. Comparing Eqs. (26) and (29), we get 

Bry 


h, «tT 


(4.34) 


By the physical sense of h,, we may expect that such a field has to affect the system’s free energy F by 
an amount comparable to the effect of a bare temperature change rt. Ensemble-averaging the last term of 
Eq. (23) and using the definition (25) of the order parameter 77, we see that the change of F (per particle) 
due to the field equals —h,77 and, according to Eq. (26), scales as hr? oc 1°F* 25 


25 As was already discussed in Secs. 1.4 and 2.4, there is some dichotomy of terminology for free energies in 
literature. In models (21) and (23), the magnetic field effects are accounted for at the microscopic level, by the 
inclusion of the corresponding term into each particular value £,,. From this point of view, the list of macroscopic 
variables in these systems does not include either P and V or their magnetic analogs, so that we may take G= F + 
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In order to estimate the thermal effect on F, let me first elaborate a bit more on the useful 
thermodynamic formula already mentioned in Sec. 1.3: 


Ci (=) ; (4.35) 
OT )y 
where X means the variable(s) maintained constant at the temperature variation. In the standard “P-V” 
thermodynamics, we may use Eqs. (1.35) for X= V, and Eqs. (1.39) for X = P, to write 


2 2 
c =1( 3) --1(2 _ , c= | --1(2 S (4.36) 
or 5% OF) nc ier aT” ),y 


As was just discussed, in the ferromagnetic models of the type (21) or (23), at a constant field /, the role 
of G is played by F, so that Eq. (35) yields 


2 
c=) --1[ 5) (4.37) 
OT Jay Ores 


The last form of this relation means that F may be found by double integration of (—C;/T) over 
temperature. With Eq. (28) for c;, « Cy, this means that near 7,, the free energy scales as the double 
integral of c, 0 t * over t. In the limit z<< 1, the factor T may be treated as a constant; as a result, the 
change of F due to r > 0 alone scales as 7° ~~ ®. Requiring this change to be proportional to the same 
power of z as the field-induced part of the energy, we finally get the Essam-Fisher relation (33). 


Using similar reasoning, it is straightforward to derive a few other universal relations of critical 
exponents, including the Widom relation, 


ji. (4.38) 


very similar relations for other high-field exponents ¢ and yw (which I do not have time to discuss), and 
the Fisher relation 


vV(2-g)=y. (4.39) 


A slightly more complex reasoning, involving the so-called scaling hypothesis, yields the following 
dimensionality-dependent Josephson relation 


vd=2-a. (4.40) 


The second column of Table 1 shows that at least three of these relations are in a very 
reasonable agreement with experiment, so that we may use their set as a testbed for various theoretical 
approaches to continuous phase transitions. 


4.3. Landau’s mean-field theory 
The highest-level approach to continuous phase transitions, formally not based on any particular 
microscopic model (though in fact implying either the Ising model (23) or one of its siblings), is the 
mean-field theory developed in 1937 by L. Landau, on the basis of prior ideas by P. Weiss — to be 


PV = F + const, and the equilibrium (at fixed h, T and N) corresponds to the minimum of the Helmholtz free 
energy PF’. 
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discussed in the next section. The main idea of this phenomenological approach is to represent the free 
energy’s change AF at the phase transition as an explicit function of the order parameter 77 (25). Since at 
T > T,, the order parameter has to tend to zero, this change, 


AP SFI) = FE) (4.41) 


may be expanded into the Taylor series in 7, and only a few, most important first terms of that 
expansion retained. In order to keep the symmetry between two possible signs of the order parameter 
(i.e. between two possible spin directions in the Ising model) in the absence of external field, at h = 0 
this expansion should include only even powers of 77: 


AF 1 
Mf | jo = lio =AT yp +> BT) +... at TT, (4.42) 


As Fig. 6 shows, at A(7) < 0, and B(7) > 0, these two terms are sufficient to describe the minimum of the 
free energy at 7° > 0, i.e. to calculate stationary values of the order parameter; this is why Landau’s 
theory ignores higher terms of the Taylor expansion — which are much smaller at 77 > 0. 


Fig. 4.6. The Landau free 
energy (42) as a function of 
(a) 7 and (b) 77, for two signs 
of the coefficient A(T), both 
for B(T) > 0. 


Now let us discuss the temperature dependencies of the coefficients A and B. As Eq. (42) shows, 
first of all, the coefficient B(7) has to be positive for any sign of 7 x (7, — 7), to ensure the equilibrium 
at a finite value of 7°. Thus, it is reasonable to ignore the temperature dependence of B near the critical 
temperature altogether, i.e. use the approximation 


B(T) =b>0. (4.43) 


On the other hand, as Fig. 6 shows, the coefficient A(7) has to change sign at 7 = T,, to be positive at T 
> T, and negative at T< T,, to ensure the transition from 7 = 0 at 7 > 7T.to a certain non-zero value of the 
order parameter at T < 7,. Assuming that A is a smooth function of temperature, we may approximate it 
by the leading term of its Taylor expansion in Tf: 


A(T)=-at, witha>0O, (4.44) 
so that Eq. (42) becomes 
Af|,-0 =—atn? + som. (4.45) 
In this rudimentary form, the Landau theory may look almost trivial, and its main strength is the 


possibility of its straightforward extension to the effects of the external field and of spatial variations of 
the order parameter. First, as the field terms in Eqs. (21) or (23) show, the applied field gives such 
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systems, on average, the energy addition of —/77 per particle, i.e. -nh7 per unit volume, where 7 is the 
particle density. Second, since according to Eq. (31) (with v > 0, see Table 1) the correlation radius 
diverges at t — 0, in this limit the spatial variations of the order parameter should be slow, | V77| > 0. 
Hence, the effects of the gradient on AF may be approximated by the first non-zero term of its expansion 
into the Taylor series in (V7)°.26 As a result, Eq. (45) may be generalized as 


AF = [ara’r, with Af =-atn’ + 5bn' —nhn+c(Vn) , (4.46) 


where c is a coefficient independent of 77. To avoid the unphysical effect of spontaneous formation of 
spatial variations of the order parameter, that factor has to be positive at all temperatures and hence may 
be taken for a constant in a small vicinity of 7, — the only region where Eq. (46) may be expected to 
provide quantitatively correct results. 

Let us find out what critical exponents are predicted by this phenomenological approach. First of 
all, we may find the equilibrium values of the order parameter from the condition of F having a 
minimum, 6F/07 = 0. At h = 0, it is easier to use the equivalent equation 6F/0(17’) = 0, where F is given 
by Eq. (45) — see Fig. 6b. This immediately yields 


1/2 
|= {r" , forr>0, (4.47) 


0, for z <0. 


Comparing this result with Eq. (26), we see that in the Landau theory, @= 4. Next, plugging the result 
(47) back into Eq. (45), for the equilibrium (minimal) value of the free energy, we get 


—a’r’?/2b, forr>0 
Af = ‘ : 4.48 
v | 0, for zt <0. ( ) 
From here and Eq. (37), the specific heat, 
c, _ a’ /bT,., fort >0, (4.49) 
V 0, for tr < 0, 


has, at the critical point, a discontinuity rather than a singularity, so that we need to prescribe zero value 
to the critical exponent @. 


In the presence of a uniform field, the equilibrium order parameter should be found from the 
condition Of/07 = 0 applied to Eq. (46) with V77 = 0, giving 


Fs -2arm + 2by' ~nh=0. (4.50) 
7 


In the limit of a small order parameter, 7 > 0, the term with 7° is negligible, and Eq. (50) gives 


n=-—, (4.51) 


26 Historically, the last term belongs to the later (1950) extension of the theory by V. Ginzburg and L. Landau — 
see below. 
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so that according to Eq. (29), y= 1. On the other hand, at 7 = 0 (or at relatively high fields at other 
temperatures), the cubic term in Eq. (50) is much larger than the linear one, and this equation yields 


hh 1/3 
a elcad 4.52 
| (=) (4.52) 


so that comparison with Eq. (32) yields 6 = 3. Finally, according to Eq. (30), the last term in Eq. (46) 
scales as c7//r,’. (If r, ¥ 2, the effects of the pre-exponential factor in Eq. (30) are negligible.) As a 
result, the gradient term’s contribution is comparable?’ with the two leading terms in Af (which, 
according to Eq. (47), are of the same order), if 


1/2 
re [= (4.53) 
a|z| 


so that according to the definition (31) of the critical exponent v, in the Landau theory it is equal to 4. 


The third column in Table 1 summarizes the critical exponents and their combinations in 
Landau’s theory. It shows that these values are somewhat out of the experimental ranges, and while 
some of their “universal” relations are correct, some are not; for example, the Josephson relation would 
be only correct at d = 4 (not the most realistic spatial dimensionality :-) The main reason for this 
disappointing result is that describing the spin interaction with the field, the Landau mean-field theory 
neglects spin randomness, i.e. fluctuations. Though a quantitative theory of fluctuations will be 
discussed only in the next chapter, we can readily perform their crude estimate. Looking at Eq. (46), we 
see that its first term is a quadratic function of the effective “half-degree of freedom’, 77. Hence per the 
equipartition theorem (2.28), we may expect that the average square of its thermal fluctuations, within a 
d-dimensional volume with a linear size of the order of 7:, should be of the order of 7/2 (close to the 
critical temperature, 7,/2 is a good enough approximation): 


T. 
m2 d 
alt Poor 4.54 
lea es (4.54) 
In order to be negligible, the variance has to be small in comparison with the average 17’ ~ az/b — see Eq. 
(47). Plugging in the z-dependences of the operands of this relation, and values of the critical exponents 
in the Landau theory, for t> 0 we get the so-called Levanyuk-Ginzburg criterion of its validity: 


T. (at ae at 
a“) ae, (4.55) 
2at\ ¢ b 


We see that for any realistic dimensionality, d < 4, at r > 0 the order parameter’s fluctuations grow 
faster than its average value, and hence the theory becomes invalid. 


Thus the Landau mean-field theory is not a perfect approach to finding critical indices at 
continuous phase transitions in Ising-type systems with their next-neighbor interactions between the 
particles. Despite that fact, this theory is very much valued because of the following reason. Any long- 
range interactions between particles increase the correlation radius 7,, and hence suppress the order 


27 According to Eq. (30), the correlation radius may be interpreted as the distance at that the order parameter 77 
relaxes to its equilibrium value, if it is deflected from that value at some point. Since the law of such spatial 
change may be obtained by a variational differentiation of F, for the actual relaxation law, all major terms of (46) 
have to be comparable. 
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parameter fluctuations. As one example, at laser self-excitation, the emerging coherent optical field 
couples essentially a// photon-emitting particles in the electromagnetic cavity (resonator). As another 
example, in superconductors the role of the correlation radius is played by the Cooper-pair size é, 
which is typically of the order of 10° m, i.e. much larger than the average distance between the pairs 
(~10° m). As a result, the mean-field theory remains valid at all temperatures besides an extremely 
small temperature interval near T, — for bulk superconductors, of the order of 10° K. 


Another strength of Landau’s classical mean-field theory (46) is that it may be readily 
generalized for a description of Bose-Einstein condensates, i.e. quantum fluids. Of those generalizations, 
the most famous is the Ginzburg-Landau theory of superconductivity. It was developed in 1950, i.e. 
even before the microscopic-level explanation of this phenomenon by J. Bardeen, L. Cooper, and R. 
Schrieffer in 1956-57. In this theory, the real order parameter 7 is replaced with the modulus of a 
complex function y, physically the wavefunction of the coherent Bose-Einstein condensate of Cooper 
pairs. Since each pair carries the electric charge g = —2e and has zero spin, it interacts with the magnetic 
field in a way different from that described by the Heisenberg or Ising models. Namely, as was already 
discussed in Sec. 3.4, in the magnetic field, the del operator V in Eq. (46) has to be complemented with 
the term —i(q/h)A, where A is the vector potential of the total magnetic field @ = VxA, including not 
only the external magnetic field # but also the field induced by the supercurrent itself. With the 
account for the well-known formula for the magnetic field energy, Eq. (46) is now replaced with 


(4.56) 


where m is a phenomenological coefficient rather than the actual particle’s mass. 


The variational minimization of the resulting Gibbs energy density Ag = Af— 3M = Af - 


HB + const?® over the variables y and & (which is suggested for reader’s exercise) yields two 
differential equations: 


VxB ag ih (4.574) 
Mo 
2 hi? .q 
aty =)y| y -—| V-i- : (4.57b) 
2m h 


The first of these Ginzburg-Landau equations (57a) should be no big surprise for the reader, 
because according to the Maxwell equations, in magnetostatics the left-hand side of Eq. (57a) has to be 
equal to the electric current density, while its right-hand side is the usual quantum-mechanical 
probability current density multiplied by gq, i.e. the density j of the electric current of the Cooper pair 
condensate. (Indeed, after plugging y =n'”exp{ig} into that expression, we come back to Eq. (3.84) 
which, as we already know, explains such macroscopic quantum phenomena as the magnetic flux 
quantization and the Meissner-Ochsenfeld effect.) 


28 As an immediate elementary sanity check of this relation, resulting from the analogy of Eqs. (1.1) and (1.3), the 
minimization of Ag in the absence of superconductivity (w= 0) gives the correct result B= 4%. Note that this 
account of the difference between Af and Ag is necessary here because (unlike Eqs. (21) and (23)), the Ginzburg- 
Landau free energy (56) does not take into account the effect of the field on each particle directly. 
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However, Eq. (57b) is new for us — at least for this course.2° Since the last term on its right-hand 
side is the standard wave-mechanical expression for the kinetic energy of a particle in the presence of a 
magnetic field,*° if this term dominates that side of the equation, Eq. (57b) is reduced to the stationary 


Schrédinger equation Ey = H y , for the ground state of free Cooper pairs, with the total energy E = ar. 


However, in contrast to the usual (single-particle) Schrédinger equation, in which |y | is determined by 
the normalization condition, the Cooper pair condensate density n = |y|’ is determined by the 
thermodynamic balance of the condensate with the ensemble of “normal” (unpaired) electrons, which 
plays the role of the uncondensed part of the particles in the usual Bose-Einstein condensate — see Sec. 
3.4. In Eq. (57b), such balance is enforced by the first term b| y|’ y on the right-hand side. As we have 
already seen, in the absence of magnetic field and spatial gradients, such term yields |y| 7’? o (T, — 
T)'” — see Eq. (47). 


As a parenthetic remark, from the mathematical standpoint, the term b| |’ y, which is nonlinear 
in y, makes Eq. (57b) a member of the family of the so-called nonlinear Schrédinger equations. 
Another member of this family, important for physics, is the Gross-Pitaevskii equation, 


2 
ary = byl v-5—V'y Uw, (4.58) 


which gives a reasonable (albeit approximate) description of gradient and field effects on Bose-Einstein 
condensates of electrically neutral atoms at T ~ T,. The differences between Eqs. (58) and (57) reflect, 
first, the zero electric charge q of the atoms (so that Eq. (57a) becomes trivial) and, second, the fact that 
the atoms forming the condensates may be readily placed in external potentials U(r) 4 const (including 
the time-averaged potentials of optical traps — see EM Chapter 7), while in superconductors such 
potential profiles are much harder to create due to the screening of external electric and optical fields by 
conductors — see, e.g., EM Sec. 2.1. 


Returning to the discussion of Eq. (57b), it is easy to see that its last term increases as either the 
external magnetic field or the density of current passed through a superconductor are increased, 
increasing the vector potential. In the Ginzburg-Landau equation, this increase is matched by a 
corresponding decrease of |y'|’, i.e. of the condensate density n, until it is completely suppressed. This 
balance describes the well-documented effect of superconductivity suppression by an external magnetic 
field and/or the supercurrent passed through the sample. Moreover, together with Eq. (57a), naturally 
describing the flux quantization (see Sec. 3.4), Eq. (57b) explains the existence of the so-called 
Abrikosov vortices — thin magnetic-field tubes, each carrying one quantum ®p» of magnetic flux — see Eq. 
(3.86). At the core part of the vortex, |y |? is suppressed (down to zero at its central line) by the 
persistent, dissipation-free current of the superconducting condensate, which circulates around the core 
and screens the rest of the superconductor from the magnetic field carried by the vortex.3! The 
penetration of such vortices into the so-called type-II superconductors enables them to sustain zero de 
resistance up to very high magnetic fields of the order of 20 T, and as a result, to be used in very 
compact magnets — including those used for beam bending in particle accelerators. 


Moreover, generalizing Eqs. (57) to the time-dependent case, just as it is done with the usual 
Schrédinger equation, one can describe other fascinating quantum macroscopic phenomena such as the 


29 It is discussed in EM Sec. 6.5. 
30 See, e.g., QM Sec. 3.1. 
3! See, e.g., EM Sec. 6.5. 
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Josephson effects, including the generation of oscillations with frequency @; = (q/h)V by weak links 
between two superconductors, biased by de voltage % Unfortunately, time/space restrictions do not 
allow me to discuss these effects in any detail in this course, and I have to refer the reader to special 
literature.32 Let me only note that in the limit T > T,, and for not extremely pure superconductor 
crystals (in which the so-called non-local transport phenomena may be important), the Ginzburg-Landau 
equations are exact, and may be derived (and their parameters 7., a, b, g, and m determined) from the 
standard “microscopic” theory of superconductivity, based on the initial work by Bardeen, Cooper, and 
Schrieffer.33 Most importantly, such derivation proves that g = —2e — the electric charge of a single 
Cooper pair. 


4.4. Ising model: The Weiss molecular-field theory 


The Landau mean-field theory is phenomenological in the sense that even within the range of its 
validity, it tells us nothing about the value of the critical temperature 7, and other parameters (in Eq. 
(46), the coefficients a, b, and c), so that they have to be found from a particular “microscopic” model of 
the system under analysis. In this course, we would have time to discuss only the Ising model (23) for 
various dimensionalities d. 


The most simplistic way to map this model on a mean-field theory is to assume that all spins are 
exactly equal, s; = 7, with an additional condition 7 < 1, ignoring for a minute the fact that in the 
genuine Ising model, s; may equal only +1 or —1. Plugging this relation into Eq. (23), we get34 


F =-(NJd)y* —Nhn. (4.59) 
This energy is plotted in Fig. 7a as a function of 77, for several values of h. 


(a) (b) 


Fig. 4.7. Field dependences 
+h, h_ of (a) the free energy profile 
and (b) the order parameter 
(i.e. magnetization) in the 
crudest mean-field approach 
to the Ising model. 


The plots show that at = 0, the system may be in either of two stable states, with 7 = +1, 
corresponding to two different spin directions (i.e. two different directions of magnetization), with equal 


32 See, e.g., M. Tinkham, Introduction to Superconductivity, 2" ed., McGraw-Hill, 1996. A short discussion of 
the Josephson effects and Abrikosov vortices may be found in QM Sec. 1.6 and EM Sec. 6.5 of this series. 

33 See, e.g., Sec. 45 in E. Lifshitz and L. Pitaevskii, Statistical Physics, Part 2, Pergamon, 1980. 

34 Since in this naive approach we neglect the fluctuations of spin, i.e. their disorder, the assumption of full 
ordering implies S = 0, so that F = EF — TS = E, and we may use either notation for the system’s energy. 
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energy.*> (Formally, the state with 77 = 0 is also stationary, because at this point OF/07 = 0, but it is 
unstable, because for the ferromagnetic interaction, J > 0, the second derivative 6°F/07 is always 
negative.) 


As the external field is increased, it tilts the potential profile, and finally at the critical field, 
h=h, =2Jd, (4.60) 


the state with 77 =—1 becomes unstable, leading to the system’s jump into the only remaining state with 
opposite magnetization, 77 = +1 — see the arrow in Fig. 7a. Application of the similar external field of the 
opposite polarity leads to the similar switching, back to 7 = —1, at the field = —h,, so that the full field 
dependence of 77 follows the hysteretic pattern shown in Fig. 7b.*° 


Such a pattern is the most visible experimental feature of actual ferromagnetic materials, with 
the coercive magnetic field H. of the order of 10° A/m, and the saturated (or “remnant”) magnetization 
corresponding to fields 4 of the order of a few teslas. The most important property of these materials, 
also called permanent magnets, is their stability, 1.e. the ability to retain the history-determined direction 
of magnetization in the absence of an external field, for a very long time. In particular, this property is 
the basis of all magnetic systems for data recording, including the now-ubiquitous hard disk drives with 
their incredible information density, currently approaching 1 Terabit per square inch.%” 


So, this simplest mean-field theory (59) does give a (crude) description of the ferromagnetic 
ordering. However, this theory grossly overestimates the stability of these states with respect to thermal 
fluctuations. Indeed, in this theory, there is no thermally-induced randomness at all, until 7 becomes 
comparable with the height of the energy barrier separating two stable states, 


AF = F(n =0)-F(n =+l) = Nd, (4.61) 


which is proportional to the number of particles. At N > oo, this value diverges, and in this sense, the 
critical temperature is infinite, while numerical experiments and more refined theories of the Ising 
model show that actually its ferromagnetic phase is suppressed at 7 > 7, ~ Jd — see below. 


The accuracy of this theory may be dramatically improved by even an approximate account for 
thermally-induced randomness. In this approach (suggested in 1907 by Pierre-Ernest Weiss), called the 
molecular-field theory,3® random deviations of individual spin values from the lattice average, 


35 The fact that the stable states always correspond to 7 = +1, partly justifies the treatment, in this crude 
approximation, of the order parameter 77 as a continuous variable. 

36 Since these magnetization jumps are accompanied by (negative) jumps of the free energy F, they are sometimes 
called the first-order phase transitions. Note, however, that in this simple theory, these transitions are between two 
physically similar fully-ordered phases. 

37 For me, it was always shocking how little my graduate students knew about this fascinating (and very 
important) field of modern engineering, which involves so much interesting physics and _ fantastic 
electromechanical technology. For getting acquainted with it, I may recommend, for example, the monograph by 
C. Mee and E. Daniel, Magnetic Recording Technology, 2" ed., McGraw-Hill, 1996. 

38 In some texts, this approximation is called the “mean-field theory”. This terminology may lead to confusion, 
because the molecular-field theory belongs to a different, deeper level of the theoretical hierarchy than, say, the 
(more phenomenological) Landau-style mean-field theories. For example, for a given microscopic model, the 
molecular-field approach may be used for the (approximate) calculation of the parameters a, b, and 7, 
participating in Eq. (46) — the starting point of the Landau theory. 
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3S =5,-9, with n=(s,), (4.62) 


oF |<< 7. This assumption allows us, after plugging the resulting 


expression s, =7+5, to the first term on the right-hand side of Eq. (23), 


Ez, po n+, ) (7 +3,.) AY 8, =I Dla +7 (5, +5,.) )+5,8, JH ADs» (4.63) 


{kk} 


ignore the last term in the square brackets. Making the replacement (62) in the terms proportional to 5, , 


we may rewrite the result as 
E, x E,,' =(NJd)n hee Diss , (4.64) 


where /ier is defined as the sum 
hy, =h+(2Jd)n. (4.65) 


This sum may be interpreted as the effective external field, which takes into account (besides the 
genuine external field h) the effect that would be exerted on spin s, by its 2d next neighbors if they all 
had non-fluctuating (but possibly continuous) spin values s;:= 7. Such addition to the external field, 


Ayo =e —h =(2Jd)n , (4.66) 


is called the molecular field — giving its name to the Weiss theory. 


From the point of view of statistical physics, at fixed parameters of the system (including the 
order parameter 77), the first term on the right-hand side of Eq. (64) is merely a constant energy offset, 
and her is just another constant, so that 


—-h 
+h, fors, =—1. 


«» tors, =+1, 


E,,' = const + ya ; with ¢, =—h,.s, = | (4.67) 
k 


Such separability of the energy means that in the molecular-field approximation the fluctuations of 
different spins are independent of each other, and their statistics may be examined individually, using 
the energy spectrum &. But this is exactly the two-level system that was the subject of Problems 2.2- 
2.4. Actually, its statistics is so simple that it is easier to redo this fundamental problem starting from 
scratch, rather than to use the results of those exercises (which would require changing notation). 


Indeed, according to the Gibbs distribution (2.58)-(2.59), the equilibrium probabilities of the 
states s; = +1 may be found as 
hes hes 
+ exp; — = 
ania 


From here, we may readily calculate F = —71nZ and all other thermodynamic variables, but let us 
immediately use Eq. (68) to calculate the statistical average of s;, i.e. the order parameter: 


thgIT hee /T 


1 : 
Wi = a with Z = exp 


(4.68) 


x Sgn (4.69) 
2cosh(h,, /T) T 


n=(s,;)=GDW,+(DW. = 


Chapter 4 Page 21 of 36 


Essential Graduate Physics SM: Statistical Mechanics 


Now comes the punch line of the Weiss’ approach: plugging this result back into Eq. (65), we may write 
the condition of self-consistency of the molecular-field theory: 


he —h =2Jd tanh 2 (4.70) 


This is a transcendental equation, which evades an explicit analytical solution, but whose properties may 
be readily analyzed by plotting both its sides as functions of the same argument, so that the stationary 
state(s) of the system corresponds to the intersection point(s) of these plots. 


First of all, let us explore the field-free case (h = 0), when het = Amoi = 2dJ7, so that Eq. (70) is 
reduced to 


= tant 4), (4.71) 


giving one of the patterns sketched in Fig. 8, depending on the dimensionless parameter 2Jd/T. 


Fig. 4.8. The ferromagnetic phase transition 
in Weiss’ molecular-field theory: two sides 
of Eq. (71) sketched as functions of 7 for 
three different temperatures: above T, (red), 
below 7, (blue), and equal to T, (green). 


If this parameter is small, the right-hand side of Eq. (71) grows slowly with 7 (see the red line in 
Fig. 8), and there is only one intersection point with the left-hand side plot, at 77 = 0. This means that the 
spin system has no spontaneous magnetization; this is the so-called paramagnetic phase. However, if 
the parameter 2Jd/T exceeds 1, i.e. if Tis decreased below the following critical value, 


an 


the right-hand side of Eq. (71) grows, at small 7, faster than its left-hand side, so that their plots 
intersect it in 3 points: 7 = 0 and 7 = +7 — see the blue line in Fig. 8. It is almost evident that the former 
stationary point is unstable, while the two latter points are stable. (This fact may be readily verified by 
using Eq. (68) to calculate F. Now the condition OF/07|,-9 = 0 returns us to Eq. (71), while calculating 
the second derivative, for T< T, we get O’F/O1’ > 0 at 7 =+m, and 0°F/O7’ <0 at 7=0). Thus, below 
T, the system is in the ferromagnetic phase, with one of two possible directions of the average 
spontaneous magnetization, so that the critical (Curie3°) temperature, given by Eq. (72), marks the 
transition between the paramagnetic and ferromagnetic phases. (Since the stable minimum value of the 
free energy F is a continuous function of temperature at T = 7), this phase transition is continuous.) 


Now let us repeat this graphics analysis to examine how each of these phases responds to an 
external magnetic field h # 0. According to Eq. (70), the effect of 4 is just a horizontal shift of the 


39 Named after Pierre Curie, rather than his (more famous) wife Marie Sktodowska-Curie. 
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straight-line plot of its left-hand side — see Fig. 9. (Note a different, here more convenient, normalization 
of both axes.) 


(b) 


2dJ 


Fig. 4.9 External field effects 
on: (a) a paramagnet (T > T,), 
and (b) a ferromagnet (T < T,). 


In the paramagnetic case (Fig. 9a) the resulting dependence /,.(/) is evidently continuous, but 
the coupling effect (J > 0) makes it steeper than it would be without spin interaction. This effect may be 
quantified by the calculation of the low-field susceptibility defined by Eq. (29). To calculate it, let us 
notice that for small 4, and hence small /.,, the function tanh in Eq. (70) is approximately equal to its 
argument so that Eq. (70) is reduced to 


_ 2d 


2Jd 
he —h = pet for Pe ha <<. (4.73) 


Solving this equation for /es, and then using Eq. (72), we get 
h h 


h, =—————— = ———_.. (4.74) 
1-2Jd/T 1-T,/T 
Recalling Eq. (66), we can rewrite this result for the order parameter: 
h, —h h 
= = : 4.75 
1 Tr T-T. (4.75) 
so that the low-field susceptibility 
Weiss (4.76) 
law 


This is the famous Curie-Weiss law, which shows that the susceptibility diverges at the approach to the 
Curie temperature Ty. 


In the ferromagnetic case, the graphical solution (Fig. 9b) of Eq. (70) gives a qualitatively 
different result. A field increase leads, depending on the spontaneous magnetization, either to the further 
saturation Of Amoi (with the order parameter 77 gradually approaching 1), or, if the initial 77 was negative, 
to a jump to positive 77 at some critical (coercive) field h.. In contrast with the crude approximation (59), 
at T > 0 the coercive field is smaller than that given by Eq. (60), and the magnetization saturation is 
gradual, in a good (semi-qualitative) accordance with experiment. 


To summarize, the Weiss molecular-field theory gives an approximate but realistic description of 
the ferromagnetic and paramagnetic phases in the Ising model, and a very simple prediction (72) of the 
temperature of the phase transition between them, for an arbitrary dimensionality d of the cubic lattice. 
It also enables calculation of other parameters of Landau’s mean-field theory for this model — an easy 
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exercise left for the reader. Moreover, the molecular-field approach allows one to obtain analytical (if 
approximate) results for other models of phase transitions — see, e.g., Problem 18. 


4.5. Ising model: Exact and numerical results 


In order to evaluate the main prediction (72) of the Weiss theory, let us now discuss the exact 
(analytical) and quasi-exact (numerical) results obtained for the Ising model, going from the lowest 
value of dimensionality, d = 0, to its higher values. Zero dimensionality means that the spin has no 
nearest neighbors at all, so that the first term of Eq. (23) vanishes. Hence Eq. (64) is exact, with her = A, 
and so is its solution (69). Now we can simply use Eq. (76), with J = 0, i.e. T, = 0, reducing this result to 
the so-called Curie law: 


Lar (4.77) 


It shows that the system is paramagnetic at any temperature. One may say that for d = 0 the Weiss 
molecular-field theory is exact — or even trivial. (However, in some sense it is more general than the 
Ising model, because as we know from Chapter 2, it gives the exact result for a fully quantum- 
mechanical treatment of any two-level system, including spin-’2.) Experimentally, the Curie law is 
approximately valid for many so-called paramagnetic materials, i.e. 3D systems with sufficiently weak 
interaction between particle spins. 


The case d = | is more complex but has an exact analytical solution. A simple (though not the 
simplest!) way to obtain it is to use the so-called transfer matrix approach.*° For this, first of all, we 
may argue that most properties of a 1D system of N >> 1 spins (say, put at equal distances on a straight 
line) should not change noticeably if we bend that line gently into a closed ring (Fig. 10), assuming that 
spins s; and sy interact exactly as all other next-neighbor pairs. Then the energy (23) becomes 


E,= —(Js,s, + Js,8; +...+Js8,)—(hs, +hs, +...+hs,). (4.78) 


5S, Fig. 4.10. The closed-ring 


a version of the 1D Ising system. 
3 


Let us regroup the terms of this sum in the following way: 


h h h h h h 
E. --|(és + Js,55 rhs, )a( $s + Js,5, rhs Jent (Soy Ee r4s,)} (4.79) 


40 Tt was developed in 1941 by H. Kramers and G. Wannier. I am following this method here because it is very 
close to the one used in quantum mechanics (see, e.g., QM Sec. 2.5), and may be applied to other problems as 
well. For a simpler approach to the 1D Ising problem, which gives an explicit solution even for an “open-end” 
system with a finite number of spins, see the model solution of Problem 5.5. 
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so that the group inside each pair of parentheses depends only on the state of two adjacent spins. The 
corresponding statistical sum, 


Z= > explh tg J AZ bexpi a+ J hh expjn e+ J 4 hh, (4.80) 
I ER OR i og oe oy setae cee 5 


$s, =41, for 
k=1,2,...N 


still has 2’ terms, each corresponding to a certain combination of signs of N spins. However, each 
operand of the product under the sum may take only four values, corresponding to four different 
combinations of its two arguments: 
exp{(J+h)/T}, fors, =s,,, =+1 
exp ee re e +h a =exp((y-h)/T}, fors, =s,,, =—1, (4.81) 
exp{-J/T}, fors, =—s,,, =I. 


These values do not depend on the site number k,4! and may be represented as the elements M,;; (with /, 
Jj = 1, 2) of the so-called transfer matrix 


exp{((U+h)/T} expt J/T} 
M= ; (4.82) 
exp{- J/T} — exp{( -h)/T} 

so that the whole statistical sum (80) may be recast as a product: 

BaD Ma Moe Mg Mi ea 
j,=1,2 
According to the basic rule of matrix multiplication, this sum is just 

Z =Tr(M"). (4.84) 

Linear algebra tells us that this trace may be represented just as 
Z=Ar +a, (4.85) 

where /; are the eigenvalues of the transfer matrix M, 1.e. the roots of its characteristic equation, 
exp{((U+A)/T!-A exp{-J/T} Ea (4.86) 
exp{- J/T} exp(J-hyT}-al 


A straightforward calculation yields 


1/2 
ae on cont in - exp “\) | (4.87) 


The last simplification comes from the condition N >> 1 — which we need anyway, to make the 
ring model sufficiently close to the infinite linear 1D system. In this limit, even a small difference of the 
exponents, A, > A., makes the second term in Eq. (85) negligible, so that we finally get 


41 This is a result of the “translational” (or rather rotational) symmetry of the system, i.e. its invariance to the 
index replacement k > k + 1 in all terms of Eq. (78). 
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27% 
Vie ga oo‘ oon! + sin’ a exp - “\) | (4.88) 
From here, we can find the free energy per particle: 
1/2 
a = - In : =-—J-Tln eles sinh” #+exp| : (4.89) 
N N Z T Ve T 


and then use thermodynamics to calculate such variables as entropy — see the first of Eqs. (1.35). 


However, we are mostly interested in the order parameter defined by Eq. (25): 7 = (s;). The 
conceptually simplest approach to the calculation of this statistical average would be to use the sum 
(2.7), with the Gibbs probabilities W,,, = Z'exp {-E,,/T}. However, the number of terms in this sum is 2”, 
so that for VN >> 1 this approach is completely impracticable. Here the analogy between the canonical 
pair {-P, V} and other generalized force-coordinate pairs {4 q}, in particular {1M(r;), mm} for the 
magnetic field, discussed in Secs. 1.1 and 1.4, becomes invaluable — see in particular Eq. (1.3b). (In our 
normalization (22), and for a uniform field, the pair {oM(rx), x} becomes {h, s;}.) Indeed, in this 
analogy the last term of Eq. (23), i.e. the sum of N products (—As,) for all spins, with the statistical 
average (—Nhy), is similar to the product PY, 1.¢. the difference between the thermodynamic potentials F 
and G = F'+ PV in the usual “P-V thermodynamics”. Hence, the free energy F' given by Eq. (89) may be 
understood as the Gibbs energy of the Ising system in the external field, and the equilibrium value of the 
order parameter may be found from the last of Eqs. (1.39) with the replacements —P > h, V— Nn: 


Nn= {=) 5: ea = 4) : (4.90) 


Note that this formula is valid for any model of ferromagnetism, of any dimensionality, if it has the same 
form of interaction with the external field as the Ising model. 


For the 1D Ising ring with N >> 1, Eqs. (89) and (90) yield 


1/2 
sg tesa 4] a On 1 27, 
= sinh — h*—+ —_— ; =— =— es 4.9] 
7 = sin - i [sin r exp 7 } giving vy ah | h-0 =F exp 7 (4.91) 


This result means that the 1D Ising model does not exhibit a phase transition, i.e., in this model 7, = 0. 
However, its susceptibility grows, at 7 — 0, much faster than the Curie law (77). This gives us a hint 
that at low temperatures the system is “virtually ferromagnetic”, i.e. has the ferromagnetic order with 
some rare random violations. (Such violations are commonly called /ow-temperature excitations.) This 
interpretation may be confirmed by the following approximate calculation. It is almost evident that the 
lowest-energy excitation of the ferromagnetic state of an open-end 1D Ising chain at h = 0 is the reversal 
of signs of all spins in one of its parts — see Fig. 11. 


I 
—OHOMO-P4+O-F)-O-O- Fig. 4.11. A Bloch wall in an open-end 
@-—O-O-O+O-O-O-O 


1D Ising system. 
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Indeed, such an excitation (called the Bloch wall*?) involves the change of sign of just one 
product s;,s,, so that according to Eq. (23), its energy Ew (defined as the difference between the values 
of E,, with and without the excitation) equals 2/, regardless of the wall’s position.*3 Since in the 
ferromagnetic Ising model, the parameter J is positive, Ew > 0. If the system “tried” to minimize its 
internal energy, having any wall in the system would be energy-disadvantageous. However, 
thermodynamics tells us that at 74 0, the system’s thermal equilibrium corresponds to the minimum of 
the free energy F = E — TS, rather than just energy E.4+ Hence, we have to calculate the Bloch wall’s 
contribution Fy to the free energy. Since in an open-end linear chain of N >> 1 spins, the wall can take 
(N — 1) = N positions with the same energy Ew, we may claim that the entropy Sw associated with this 
excitation is InN, so that 

FP, =F, -TSy =2J-TinN. (4.92) 


This result tells us that in the limit N — o, and at 7 # 0, walls are always free-energy-beneficial, 
thus explaining the absence of the perfect ferromagnetic order in the 1D Ising system. Note, however, 
that since the logarithmic function changes extremely slowly at large values of its argument, one may 
argue that a large but finite 1D system should still feature a quasi-critical temperature 


Wa 2J 


= ‘ 4.93 
© TN (4.93) 
below which it would be in a virtually complete ferromagnetic order. (The exponentially large 


susceptibility (91) is another manifestation of this fact.) 


Now let us apply a similar approach to estimate 7, of a 2D Ising model, with open borders. Here 
the Bloch wall is a line of a certain total length ZL — see Fig. 12. (For the example presented in that 
figure, counting from the left to the right, L =2 + 1+4+2+3 = 12 lattice periods.) Evidently, the 
additional energy associated with such a wall is Ew = 2JL, while the wall’s entropy Sw may be estimated 
using the following reasoning. Let the wall be formed along the path of a “Manhattan pedestrian” 
traveling between its nodes. (The dashed line in Fig. 12 is an example of such a path.) At each junction, 
the pedestrian may select 3 choices of 4 possible directions (except the one that leads backward), so that 
there are approximately 3") ~ 3” options for a walk starting from a certain point. Now taking into 
account that the open borders of a square-shaped lattice with N spins have a length of the order of N'”, 
and the Bloch wall may start from any of them, there are approximately M ~ N'3" different walks 
between two borders. Again estimating Sj as InM, we get 


F, =Ey —TSy = 2JL—TIn(N"?3!)= LJ —Tln3)-(7/2)InN. (4.94) 


(Actually, since L scales as N'” or higher, at N > © the last term in Eq. (94) is negligible.) We see that 
the sign of the derivative OF yw /OL depends on whether the temperature is higher or lower than the 
following critical value: 


42 Named after Felix Bloch who was the first one to discuss such excitations in ferromagnetism. 

43 For the closed-ring model (Fig. 10) such analysis gives an almost similar prediction, with the difference that in 
that system, the Bloch walls may appear only in pairs, so that Ey = 4J, and Sw = In[NM(N — 1)] = 2InN. 

44 This is a very vivid application of one of the core results of thermodynamics. If the reader is still uncomfortable 
with it, they are strongly encouraged to revisit Eq. (1.42) and its discussion. 
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po ieee (4.95) 
In3 


At T < T,, the free energy’s minimum corresponds to L — 0, i.e. the Bloch walls are free-energy- 
detrimental, and the system is in the purely ferromagnetic phase. 


Fig. 4.12. A Bloch wall in a 2D Ising system. 


So, for d = 2 the estimates predict a non-zero critical temperature of the same order as the Weiss 
theory (according to Eq. (72), in this case 7, = 4/). The major approximation implied in our calculation 
leading to Eq. (95) is disregarding possible self-crossings of the “Manhattan walk”. The accurate 
counting of such self-crossings is rather difficult. It had been carried out in 1944 by L. Onsager; since 
then his calculations have been redone in several easier ways, but even they are rather cumbersome, and 
I will not have time to discuss them.* The final result, however, is surprisingly simple: 


2d 


= = 2.269 J, (4.96) 
in{l + Nel 


i.e. showing that the simple estimate (95) is off the mark by only ~20%. 


The Onsager solution, as well as all alternative solutions of the problem that were found later, 
are so “artificial” (2D-specific) that they do not give a clear way towards their generalization to other 
(higher) dimensions. As a result, the 3D Ising problem is still unsolved analytically. Nevertheless, we do 
know 7; for it with extremely high precision — at least to the 6" decimal place. This has been achieved 
by numerical methods; they deserve a thorough discussion because of their importance for the solution 
of other similar problems as well. 


Conceptually, this task is rather simple: just compute, to the desired precision, the statistical sum 
of the system (23): 


Z= > end a Se tyt (4.97) 


{kk 


As soon as this has been done for a sufficient number of values of the dimensionless parameters J/T and 
h/T, everything becomes easy; in particular, we can compute the dimensionless function 


F/T=-inZ, (4.98) 


45 For that, the interested reader may be referred to either Sec. 151 in the textbook by Landau and Lifshitz, or 
Chapter 15 in the text by Huang, both cited above. 
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and then find the ratio J/T, as the smallest value of the parameter J/T at that the ratio F/T (as a function 
of h/T) has a minimum at zero field. However, for any system of a reasonable size N, the “exact” 
computation of the statistical sum (97) is impossible, because it contains too many terms for any 
supercomputer to handle. For example, let us take a relatively small 3D lattice with N = 10x10x10 = 10° 
spins, which still feature substantial boundary artifacts even using the periodic boundary conditions, so 
that its phase transition is smeared about 7, by ~ 3%. Still, even for such a crude model, Z would include 
21,000 = 2")! ~ (10°) = 10°” terms. Let us suppose we are using a modern exaflops-scale 
supercomputer performing 10'® floating-point operations per second, i.e. ~10*° such operations per year. 
With those resources, the computation of just one statistical sum would require ~10°°°°® = 10°” years. 
To call such a number “astronomic” would be a strong understatement. (As a reminder, the age of our 
Universe is close to 1.3x10'° years — a very humble number in comparison.) 


This situation may be improved dramatically by noticing that any statistical sum, 


Z= Yespl- *s} : (4.99) 


m 


is dominated by terms with lower values of E,,. To find those lowest-energy states, we may use the 
following powerful approach (belonging to a broad class of numerical Monte-Carlo techniques), which 
essentially mimics one (randomly selected) path of the system’s evolution in time. One could argue that 
for that we would need to know the exact laws of evolution of statistical systems,*° that may differ from 
one system to another, even if their energy spectra E,, are the same. This is true, but since the genuine 
value of Z should be independent of these details, it may be evaluated using any reasonable kinetic 
model that satisfies certain general rules. In order to reveal these rules, let us start from a system with 
just two states, with energies E,, and E,, = E,, + A—see Fig. 13. 


BE=E, +A 


Fig. 4.13. Deriving the detailed 
E balance relation. 


In the absence of quantum coherence between the states (see Sec. 2.1), the equations for the time 
evolution of the corresponding probabilities W,, and W,, should depend only on the probabilities (plus 
certain constant coefficients). Moreover, since the equations of quantum mechanics are linear, these 
master equations should be also linear. Hence, it is natural to expect them to have the following form, 


(4.100) 


where the coefficients [+ and I) have the physical sense of the rates of the corresponding transitions 
(see Fig. 13); for example, I'tdt is the probability of the system’s transition into the state m’ during an 
infinitesimal time interval dt, provided that at the beginning of that interval it was in the state m with full 
certainty: W,, = 1, Wy, = 0.47 Since for the system with just two energy levels, the time derivatives of the 


46 Discussion of such laws in the task of physical kinetics, which will be briefly reviewed in Chapter 6. 
47 The calculation of these rates for several particular cases is described in QM Secs. 6.6, 6.7, and 7.6 — see, e.g., 
QM Eq. (7.196), which is valid for a very general model of a quantum system. 
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probabilities have to be equal and opposite, Eqs. (100) describe an (irreversible) redistribution of the 
probabilities while keeping their sum W = W,,, + W,, constant. According to Eqs. (100), at t — oo the 
probabilities settle to their stationary values related as 


We tt (4.101) 
Ww, Tv, 


Now let us require these stationary values to obey the Gibbs distribution (2.58); from it 


W., F=f. A 
exp 2 expe — el, 4.102 
W 7 7 | x +h ( ) 


m 


Comparing these two expressions, we see that the rates have to satisfy the following detailed balance 
relation: 


(4.103) 


Now comes the final step: since the rates of transition between two particular states should not depend 
on other states and their occupation, Eq. (103) has to be valid for each pair of states of any multi-state 
system. (By the way, this relation may serve as an important sanity check: the rates calculated using any 
reasonable model of a quantum system have to satisfy it.) 


The detailed balance yields only one equation for two rates I+ and Ty; if our only goal is the 
calculation of Z, the choice of the other equation is not too important. A very simple choice is 


r(a)s« (A) -| 


where A is the energy change resulting from the transition. This model, which evidently satisfies the 
detailed balance relation (103), is very popular (despite the unphysical cusp this function has at A = 0), 
because it enables the following simple Metropolis algorithm (Fig. 14). 


1, if A <0, 


4.104 
exp{-A/T}, otherwise, ( ) 


set up an initial state 


- flip a random spin 
- calculate A 
- calculate v(A) 


generate random & 
(O<é <l) 


Fig. 4.14. A crude scheme of 
the Metropolis algorithm for 
the Ising model simulation. 


reject 
spin flip 


accept 
spin flip 


compare 
yog 
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The calculation starts by setting a certain initial state of the system. At relatively high 
temperatures, the state may be generated randomly; for example, in the Ising system, the initial state of 
each spin s; may be selected independently, with a 50% probability. At low temperatures, starting the 
calculations from the lowest-energy state (in particular, for the Ising model, from the ferromagnetic state 
Ss; = sgn(h) = const) may give the fastest convergence. Now one spin is flipped at random, the 
corresponding change A of the energy is calculated,*® and plugged into Eq. (104) to calculate (A). Next, 
a pseudo-random number generator is used to generate a random number ¢, with the probability density 
being constant on the segment [0, 1]. (Such functions are available in virtually any numerical library.) If 
the resulting € is less than A), the transition is accepted, while if ¢ > A), it is rejected. Physically, 
this means that any transition down the energy spectrum (A < 0) is always accepted, while those up the 
energy profile (A > 0) are accepted with the probability proportional to exp {—A/7T}.49 After sufficiently 
many such steps, the statistical sum (99) may be calculated approximately as a partial sum over the 
states passed by the system. (It may be better to discard the contributions from a few first steps, to avoid 
the effects of the initial state choice.) 


This algorithm is extremely efficient. Even with modest computers available in the 1980s, it has 
allowed simulating a 3D Ising system of (128) spins to get the following result: J/T. ~ 0.221650 + 
0.000005. For all practical purposes, this result is exact — so that perhaps the largest benefit of the 
possible future analytical solution of the infinite 3D Ising problem will be a virtually certain Nobel Prize 
for its author. Table 2 summarizes the values of 7, for the Ising model. Very visible is the fast 
improvement of the prediction accuracy of the molecular-field theory — which is asymptotically correct 
at d > ». 


Table 4.2. The critical temperature 7, (in the units of /) of the Ising model 
of a ferromagnet (J> 0), for several values of dimensionality d 


d_| Molecular-field theory — Eq. (72) | Exact value Exact value’s source 
0 0 0 Gibbs distribution 

1 2 0 Transfer matrix theory 
2 4 ZOD oxi Onsager’s solution 
5) 6 4.513... Numerical simulation 


Finally, I need to mention the renormalization-group (“RG”) approach,*° despite its low 
efficiency for the Ising-type problems. The basic idea of this approach stems from the scaling law (30)- 
(31): at 7 = T, the correlation radius r, diverges. Hence, the critical temperature may be found from the 
requirement for the system to be spatially self-similar. Namely, let us form larger and larger groups 
(“blocks”) of adjacent spins, and require that all properties of the resulting system of the blocks 
approach those of the initial system, as T approaches T,. 


48 Note that a flip of a single spin changes the signs of only (2d + 1) terms in the sum (23), i.e. does not require 
the re-calculation of all (22 +1)N terms of the sum, so that the computation of A takes just a few multiply-and- 
accumulate operations even at N>> 1. 

49 The latter step is necessary to avoid the system’s trapping in local minima of its multidimensional energy 
profile £,,(s1, S2,..., Sn). 

50 Initially developed in the quantum field theory in the 1950s, it was adapted to statistics by L. Kadanoff in 1966, 
with a spectacular solution of the so-called Kubo problem by K. Wilson in 1972, later awarded with a Nobel Prize. 


Chapter 4 Page 31 of 36 


Essential Graduate Physics SM: Statistical Mechanics 


Let us see how this idea works for the simplest nontrivial (1D) case, described by the statistical 
sum (80). Assuming N to be even (which does not matter at N + oo), and adding an inconsequential 
constant C to each exponent (for the purpose that will be clear soon), we may rewrite this expression as 


h J h 
Z= exp; ——§, + — $8, +— 8, $C P. 4.105 
>» JI, apa a any ae ( ) 


Let us group each pair of adjacent exponents to recast this expression as a product over only even 
numbers k, 


h J h h 
Z= CXPs Sy Se) eg oe | 2G 4.106 
YT eoftssts[Zostudet} ts r2ef, 4.109) 


and carry out the summation over two possible states of the internal spin s; explicitly: 


h J h_ h 
exp ss. +l. + Spa )+ a+ oe Sen +2¢| 


S,=t] k=2,4,... h J A A 
aaa! exp sc ~ FW +Seu)— Ft Sen +2¢| (4.107) 


Ill 


J h h 
I] 2 cosh (55-1 + Sy) + + exp (5,1 +Sy)+ act. 


sp=tl k=2,4,...N 


Now let us require this statistical sum (and hence all statistical properties of the system of 2-spin 
blocks) to be identical to that of the Ising system of N/2 spins, numbered by odd k: 


Z=> |] oP) 5.84 + a + ch (4.108) 
s,=tl k=2,4,....N T T 
with some different parameters h’, J’, and C’, for all four possible values of sy; = +1 and sy, = +1. 
Since the right-hand side of Eq. (107) depends only on the sum (s;-; + s;+1), this requirement yields only 
three (rather than four) independent equations for finding h’, J’, and C’. Of them, the equations for h’ 
and J’ depend only on / and J (but not on C),°! and may be represented in an especially simple form, 


ye yy pa) (4.109) 


(xt y)+ xy)’ 1+xy 


if the following notation is used: 


x= exp 47, y= exp 25. (4.110) 


Now the grouping procedure may be repeated, with the same result (109)-(110). Hence these 
equations may be considered as recurrence relations describing repeated doubling of the spin block size. 
Figure 15 shows (schematically) the trajectories of this dynamic system on the phase plane [x, y]. (Each 
trajectory is defined by the following property: for each of its points {x, vy}, the point {x’, y’} defined by 


5! This might be expected because physically C is just a certain constant addition to the system’s energy. 
However, the introduction of that constant is mathematically necessary, because Eqs. (107) and (108) may be 
reconciled only if C’# C. 
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the “mapping” Eq. (109) is also on the same trajectory.) For ferromagnetic coupling (J > 0) and h > 0, 
we may limit the analysis to the unit square 0 < x, y < 1. If this flow diagram had a stable fixed point 
with x’ = x =x. # 0 (ie. T/J < «) and y’ = y= 1 (i.e. h = 0), then the first of Eqs. (110) would 
immediately give us the critical temperature of the phase transition in the field-free system: 


4J 


T =—~_.. 4.111 
“ — In(1/x,,) ( ) 


However, Fig. 15 shows that the only fixed point of the 1D system is x = y = 0, which (at a finite 
coupling /) should be interpreted as T, = 0. This is of course in agreement with the exact result of the 
transfer-matrix analysis, but does not provide any additional information. 


= +} 
YS eRD r 


Fig. 4.15. The RG flow 
diagram of the 1D Ising 
system (schematically). 


O hae 1 ySexp{4/T} 


Unfortunately, for higher dimensionalities, the renormalization-group approach rapidly becomes 
rather cumbersome and requires certain approximations, whose accuracy cannot be easily controlled. 
For the 2D Ising system, such approximations lead to the prediction 7, ~ 2.55 J, i.e. to a substantial 
difference from the exact result (96). 


4.6. Exercise problems 


4.1. Compare the third virial coefficient C(7) that follows from the van der Waals equation, with 
its value for the hardball model of particle interactions (whose calculation was the subject of Problem 
3.28), and comment. 


4.2. Calculate the entropy and the internal energy of the van der Waals gas, and discuss the 
results. 


4.3. Use two different approaches to calculate the so-called Joule-Thomson coefficient (OE/OV)r 
for the van der Waals gas, and the change of temperature of such a gas, with a temperature-independent 


Cy, at its fast expansion. 


4.4. Calculate the difference Cp — Cy for the van der Waals gas, and compare the result with that 
for an ideal classical gas. 


4.5. Calculate the temperature dependence of the phase-equilibrium pressure Po(7) and the latent 
heat A(7), for the van der Waals model, in the low-temperature limit T << T,. 
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4.6. Perform the same tasks as in the previous problem in the opposite limit — in close vicinity of 
the critical point T.. 


4.7. Calculate the critical values P., Vc, and T, for the so-called Redlich-Kwong model of the real 
gas, with the following equation of state:>? 
a NT 


"VV + NB)? V—Nb’ 


with constant parameters a and b. 


Hint: Be prepared to solve a cubic equation with particular (numerical) coefficients. 
4.8. Calculate the critical values P., V., and T, for the phenomenological Dieterici model, with 


the following equation of state:>3 
p= NT ate a 
V—b NIV 


with constant parameters a and b. Compare the value of the dimensionless factor P.V./NT, with those 
given by the van der Waals and Redlich-Kwong models. 


4.9. In the crude sketch shown in Fig. 3b, the derivatives dP/dT of the phase transitions liquid- 
gas (“vaporization”) and solid-gas (“sublimation’’), at the triple point, are different, with 


(aa “(ae 
aT T=T, aT T=T, 


Is this occasional? What relation between these derivatives can be obtained from thermodynamics? 


4.10. Use the Clapeyron-Clausius formula (17) to calculate the latent heat A of the Bose-Einstein 
condensation, and compare the result with that obtained in the solution of Problem 3.21. 


4.11 


(i) Write the effective Hamiltonian for that the usual single-particle stationary Schrédinger 
equation coincides with the Gross-Pitaevski equation (58). 

(ii) Use this Gross-Pitaevskii Hamiltonian, with the trapping potential U(r) = ma’r’/2, to 
calculate the energy E of N >> 1 trapped particles, assuming the trial solution yo exp{—r*/2ro'}, as a 
function of the parameter 79.54 


52 This equation of state, suggested in 1948, describes most real gases better than not only the original van der 
Waals model, but also other two-parameter alternatives, such as the Berthelot, modified-Berthelot, and Dieterici 
models, though some approximations with more fitting parameters (such as the Soave-Redlich-Kwong model) 
work even better. 

53 This model is currently less popular than the Redlich-Kwong one (also with two fitting parameters), whose 
analysis was the task of the previous problem. 

54 This task is essentially the first step of the variational method of quantum mechanics — see, e.g., QM Sec. 2.9. 
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(iii) Explore the function E(70) for positive and negative values of the constant b, and interpret 
the results. 

(iv) For small b < 0, estimate the largest number JN of particles that may form a metastable Bose- 
Einstein condensate. 


4.12. Superconductivity may be suppressed by a sufficiently strong magnetic field. In the 
simplest case of a bulk, long cylindrical sample of a type-I superconductor, placed into an external 
magnetic field H+: parallel to its surface, this suppression takes a simple form of a simultaneous 
transition of the whole sample from the superconducting state to the “normal” (non-superconducting) 
state at a certain value #7) of the field’s magnitude. This critical field gradually decreases with 
temperature from its maximum value H#(0) at T > 0 to zero at the critical temperature 7.. Assuming 
that the function #7) is known, calculate the latent heat of this phase transition as a function of 
temperature, and spell out its values at 7 — 0 and T= T,. 


Hint: In this context, “bulk sample” means a sample much larger than the intrinsic length scales 
of the superconductor (such as the London penetration depth d, and the coherence length €).°> For such 
bulk superconductors, magnetic properties of the superconducting phase may be well described just as 
the perfect diamagnetism, with @ = 0 inside it. 


4.13. In some textbooks, the discussion of thermodynamics of superconductivity is started with 
displaying, as self-evident, the following formula: 


) e HyHe (T) 


F(T)-F(T 


Ss 


V, 


where F, and F;, are the free energy values in the superconducting and non-superconducting (“normal”) 
phases, and (7) is the critical value of the magnetic external field. Is this formula correct, and if not, 
what qualification is necessary to make it valid? Assume that all conditions of the simultaneous field- 
induced phase transition in the whole sample, spelled out in the previous problem, are satisfied. 


4.14. In Sec. 4, we have discussed Weiss’ molecular-field approach to the Ising model, in which 
the average (s;) plays the role of the order parameter 77. Use the results of that analysis to calculate the 
coefficients a and b in the corresponding Landau expansion (46) of the free energy. List the critical 
exponents a and /, defined by Eqs. (26) and (28), within this approach. 


4.15. Consider a ring of N = 3 Ising “spins” (s, = +1), with similar ferromagnetic coupling J 
between all sites, in thermal equilibrium. 


(i) Calculate the order parameter 77 and the low-field susceptibility v = 07/Oh|h-0. 

(11) Use the low-temperature limit of the result for v to predict it for a ring with an arbitrary N, 
and verify your prediction by a direct calculation (in this limit). 

(iii) Discuss the relation between the last result, in the limit NV > , and Eq. (91). 


55A discussion of these parameters, as well as of the difference between the type-I and type-II superconductivity, 
may be found in EM Secs. 6.4-6.5. However, those details are not needed for the solution of this problem. 
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4.16. Calculate the average energy, entropy, and heat capacity of a three-site ring of Ising-type 
“spins” (s; = +1), with anti-ferromagnetic coupling (of magnitude J) between the sites, in thermal 
equilibrium at temperature 7, with no external magnetic field. Find the asymptotic behavior of its heat 
capacity for low and high temperatures, and give an interpretation of the results. 


4.17. Using the results discussed in Sec. 5, calculate the average energy, free energy, entropy, 
and heat capacity (all per spin) as functions of temperature T and external field 4, for the infinite 1D 
Ising model. Sketch the temperature dependence of the heat capacity for various values of ratio A/J, and 
give a physical interpretation of the result. 


4.18. Use the molecular-field theory to calculate the critical temperature and the low-field 
susceptibility of a d-dimensional cubic lattice of spins, described by the so-called classical Heisenberg 


model:>© 
E,, =-J ¥8, 8, — > b+s,. 
{k,,k'} k 


Here, in contrast to the (otherwise, very similar) Ising model (23), the spin of each site is modeled by a 
classical 3D vector 84 = {Sxks Syks S-k} Of unit length: sZ = 1. 


56 This classical model is formally similar to the generalization of the genuine (quantum) Heisenberg model (21) 
to arbitrary spin s, and serves as its infinite-spin limit. 
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Chapter 5. Fluctuations 


This chapter discusses fluctuations of macroscopic variables, mostly at thermodynamic equilibrium. In 
particular, it describes the intimate connection between fluctuations and dissipation (damping) in 
dynamic systems weakly coupled to multi-particle environments, which culminates in the Einstein 
relation between the diffusion coefficient and mobility, the Nyquist formula, and its quantum- 
mechanical generalization — the fluctuation-dissipation theorem. An alternative approach to the same 
problem, based on the Smoluchowski and Fokker-Planck equations, is also discussed in brief. 


5.1. Characterization of fluctuations 


At the beginning of Chapter 2, we have discussed the notion of averaging, (/f), of a variable f 
over a Statistical ensemble — see Eqs. (2.7) and (2.10). Now, the fluctuation of the variable is defined 
simply as its deviation from such average: 


f=f-(f); (5.1) 


this deviation is, generally, also a random variable. The most important property of any fluctuation is 
that its average (over the same statistical ensemble) equals zero: 


(Ae(t-(n=(-ln=(-") 


As a result, such an average cannot characterize fluctuations’ intensity, and the simplest characteristic of 
the intensity is the variance (sometimes called “dispersion’”): 


es 2 
(F\=((r-(pyy). (5.3) 
The following simple property of the variance is frequently convenient for its calculation: 


(P= - GW) = (0? 24) +(AY = (07) -20Y 4 ZY, (5.4a) 


0. (5.2) 


so that, finally: 
F)=(F7)-(FY. (5.4) 


As the simplest example, consider a variable that takes only two values, +1, with equal 
probabilities W;= '4. For such a variable, the basic Eq. (2.7) yields 


l 1 aN 2_1 ) a De _. 
a ara ak «i )= LMS} = GD? +5 = 140, 


so that P=?) =1. 


The square root of the variance, 


x=(7?) >, (5.6) 
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is called the root-mean-square (r.m.s.) fluctuation. An advantage of this measure is that it has the same 
dimensionality as the variable itself, so that the ratio 6f/(f) is dimensionless, and may be used to 
characterize the relative intensity of fluctuations. 


As has been mentioned in Chapter 1, all results of thermodynamics are valid only if the 
fluctuations of thermodynamic variables (internal energy E, entropy S, etc.) are relatively small.! Let us 
make a simple estimate of the relative intensity of fluctuations for an example of a system of N 
independent, similar particles, and an extensive variable 


F=y fy. (5.7) 


where all single-particle functions f;, are similar, besides that each of them depends on the state of only 
“its own” (k'") particle. The statistical average of such J is evidently 


(F)=DUN)= NU), 6.8) 


1 
while its fluctuation variance is 


a ins No ON NO No 
(F :) = (FF | = (LAL) = > k | = pS (aa (5.9) 
k=l k=l 
Now we may use the fact that for two independent variables 
(fife) =0, for k'# k; (5.10) 


indeed, this relation may be considered as the mathematical definition of their independence. Hence, 
only the terms with k’ = k make substantial contributions to the sum (9): 


~~ N ~~ ~ 
(F*) = LF eae). (5.11) 
k,k'=1 
Comparing Eqs. (8) and (11), we see that the relative intensity of fluctuations of the variable ¥, 
Relative 


(5.12) _ fluctuation 


estimate 


tends to zero as the system size grows (N — 0). It is this fact that justifies the thermodynamic approach 
to typical physical systems, with the number JN of particles of the order of the Avogadro number Na ~ 
10°*. Nevertheless, in many situations even small fluctuations of variables are important, and in this 
chapter we will calculate their basic properties, starting with the variance. 


It should be comforting for the reader to notice that for some simple (but very important) cases, 
such calculation has already been done in our course. In particular, for any generalized coordinate g and 
generalized momentum p that give quadratic contributions of the type (2.46) to the system’s 


! Let me remind the reader that up to this point, the averaging signs (...) were dropped in most formulas, for the 
sake of notation simplicity. In this chapter, I have to restore these signs to avoid confusion. The only exception 
will be temperature — whose average, following (probably, bad :-) tradition, will be still called just T everywhere, 
besides the last part of Sec. 3, where temperature fluctuations are discussed explicitly. 


Chapter 5 Page 2 of 44 


Essential Graduate Physics SM: Statistical Mechanics 


Hamiltonian (as in a harmonic oscillator), we have derived the equipartition theorem (2.48), valid in the 
classical limit. Since the average values of these variables, in the thermodynamic equilibrium, equal 
zero, Eq. (6) immediately yields their r.m.s. fluctuations: 


op =(mT)'”’, ay -(7) -( “| , where o-(£) (5.13) 
m 


Le MQ@ 


The generalization of these classical relations to the quantum-mechanical case (7'~ fa@) is provided by 


Eqs. (2.78) and (2.81): 
hmo ho h ho 
Op = oth—]| , = coth — ; 5.14 
— Lo ze ze Ee tal On 


However, the intensity of fluctuations in other systems requires special calculations. Moreover, 
only a few cases allow for general, model-independent results. Let us review some of them. 


5.2. Energy and the number of particles 


First of all, note that fluctuations of macroscopic variables depend on particular conditions.? For 
example, in a mechanically- and thermally-insulated system with a fixed number of particles, i.e. a 
member of a microcanonical ensemble, the internal energy does not fluctuate: dE = 0. However, if such 
a system is in thermal contact with the environment, i.e. is a member of a canonical ensemble (Fig. 2.6), 
the situation is different. Indeed, for such a system we may apply the general Eq. (2.7), with W,,, given 
by the Gibbs distribution (2.58)-(2.59), not only to E but also to E*. As we already know from Sec. 2.4, 
the first average, 


1 E E 
=>) Woes W, =—exp,-—;, Z= ) exps-— >, 5.15 
DM nEn =e] | Dexn|-F2 (5.15) 
yields Eq. (2.61b), which may be rewritten in the form 
poe. Se ae (5.16) 
Z a-B) i 


more convenient for our current purposes. Let us carry out a similar calculation for E’: 
(E?) = UW nen = ie exp{- BE,,}. (5.17) 


It is straightforward to verify, by double differentiation, that the last expression may be rewritten in a 
form similar to Eq. (16): 

2 
Fie a (5.18) 
Z O-P) 


Now it is easy to use Eqs. (4) to calculate the variance of energy fluctuations: 


(#2) =(z2) (zy? = ae 3 OZ ) _@ c OZ |- O(E) (6.19) 
ZA-BY \ZA-f)} O-A)\ZO(-f)) Af) 


Cer Z a(- re 


2 Unfortunately, even in some popular textbooks, certain formulas pertaining to fluctuations are either incorrect or 
given without specifying the conditions of their applicability, so that the reader’s caution is advised. 
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Since Eqs. (15)-(19) are valid only if the system’s volume V is fixed (because its change may affect the 
energy spectrum E£,,), it is customary to rewrite this important result as follows: 


(E?\ = o{E) sal | = Teas (5.20) 


~ A(-1/T) - 


This is a remarkably simple, fundamental result. As a sanity check, for a system of N similar, 
independent particles, ( £ ) and hence Cy are proportional to N, so that dE « N'” and dEME) « N'”, in 
agreement with Eq. (12). Let me emphasize that the classically-looking Eq. (20) is based on the general 
Gibbs distribution, and hence is valid for any system (either classical or quantum) in thermal 
equilibrium. 


Some corollaries of this result will be discussed in the next section, and now let us carry out a 
very similar calculation for a system whose number N of particles in a system is not fixed, because they 
may go to, and come from its environment at will. If the chemical potential ~ of the environment and its 
temperature T are fixed, i.e. we are dealing with the grand canonical ensemble (Fig. 2.13), we may use 
the grand canonical distribution (2.106)-(2.107): 


—E —E 
Win = 5 exo ; m,N ' vie = Yep] P| : (5.21) 
G Nm 
Acting exactly as we did above for the internal energy, we get 
=i OZ 
(v) = Y.Nexp HN Faw | T Ol g (5.22) 
Le m,N T Le Ou 
= E 2, CZ 
(w?\=4 yw? ea es o (5.23) 
Ze m,N T Lig Ou 


so that the particle number’s variance is 


he ipl dete To 02a. Top Oln 0) ©.Oee\ ON) 
(W )=(N )-(N) (Ze) -r2(2 va.) <7 Ou” oe 


“2. On ZF 


in full analogy with Eq. (19). 


In particular, for an ideal classical gas, we may combine the last result with Eq. (3.32b). (As was 
already emphasized in Sec. 3.2, though that result has been obtained for the canonical ensemble, in 
which the number of particles N is fixed, at VN >> 1 the fluctuations of N in the grand canonical ensemble 
should be relatively small, so that the same relation should be valid for the average (N) in that 
ensemble.) Easily solving Eq. (3.32b) for (NV), we get 


(N) = const x exp| (5.25) 


where “const” means a factor constant at the partial differentiation of ( N ) over yz, required by Eq. (24). 
Performing the differentiation and then using Eq. (25) again, 
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am) Soni exp| | = cute (5.26) 
Ou ple) Fr 


we get from Eq. (24) a very simple result: 
(N?)=(N), ie. 6N=(N)'". (5.27) 


This relation is so important that I will also show how it may be derived differently. As a by- 
product of this new derivation, we will prove that this result is valid for systems with an arbitrary (say, 
small) N, and also get more detailed information about the statistics of fluctuations of that number. Let 
us consider an ideal classical gas of No particles in a volume Vo, and calculate the probability Wy to have 
exactly N < Np of these particles in its part of volume V < Vo — see Fig. 1. 


Fig. 5.1. Deriving the binomial 
and Poisson distributions. 


For one particle such probability is W = V/Vo = (N)/No < 1, while the probability to have that 
particle in the remaining part of the volume is W’=1—W=1-(N)/N). If all particles were 
distinguishable, the probability of having N < No specific particles in volume V and (N — No) specific 
particles in volume (V— Vo), would be WYW**"”), However, if we do not want to distinguish the 
particles, we should multiply this probability by the number of possible particle combinations keeping 
the numbers N and Np constant, i.e. by the binomial coefficient No!/N!(No — N)!.3 As the result, the 
required probability is 


Wy = Wyre) —_No! (2 c om) os, (5.28) 
NWN, -N)! No Ny N\(N, - NY! 


This is the so-called binomial probability distribution, valid for any ( N) and No.* 


Still keeping ( V) arbitrary, we can simplify the binomial distribution by assuming that the whole 
volume Vo, and hence No, are very large: 
N,>>N, (5.29) 


where N means all values of interest, including (N). Indeed, in this limit we can neglect N in 
comparison with No in the second exponent of Eq. (28), and also approximate the fraction No!/(No—N)!, 
i.e. the product of N terms, (No — N+ 1) (No— N+ 2)...(No— 1)No, by just No’. As a result, we get 


my =(2Y) [i wo" Na (NY c mo)" AP fame] (5.30) 


Ny N,} WN! WN! Ny M! 


3 See, e.g., MA Eq. (2.2). 
4 It was derived by Jacob Bernoulli (1655-1705). 
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where, as before, W = (N)/No. In the limit (29), W — 0, so that the factor inside the square brackets 
tends to 1/e, the reciprocal of the natural logarithm base.> Thus, we get an expression independent of No: 


(5 3 1) Poisson 


distribution 


This is the much-celebrated Poisson distribution® which describes a very broad family of random 
phenomena. Figure 2 shows this distribution for several values of (NV) — which, in contrast to N, are not 
necessarily integer. 


d T T T T 


0.8 HL FJ 


Fig. 5.2. The Poisson distribution for 
several values of (N). In contrast to 
that average, the argument N may take 
only integer values, so that the lines in 
these plots are only guides for the eye. 


In the limit of very small (N), the function Wy(N) is close to an exponent, Wy = WY o (NY, 
while in the opposite limit, (NV) >> 1, it rapidly approaches the Gaussian (or “normal”) distribution’ 


Gaussian 


(5.32) distribution 


(Note that the Gaussian distribution is also valid if both N and Np are large, regardless of the relation 


between them — see Fig. 3.) 
Binomial distribution Poisson distribution 
Eq. (28) Eq. (31) 


Gaussian distribution 
Eq. (32) 


(N) << N, 


Fig. 5.3. The hierarchy of three 


1<<(N),N, 1<<(N) major probability distributions. 


5 Indeed, this is just the most popular definition of that major mathematical constant — see, e.g., MA Eq. (1.2a) 
with n =—1/W. 

6 Named after the same Siméon Denis Poisson (1781-1840) who is also responsible for other mathematical tools 
and results used in this series, including the Poisson equation — see Sec. 6.4 below. 

7 Named after Carl Friedrich Gauss (1777-1855), though Pierre-Simone Laplace (1749-1827) is credited for 
substantial contributions to its development. 
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A major property of the Poisson (and hence of the Gaussian) distribution is that it has the same 
variance as given by Eq. (27): 


(i?) = ((w-(wy)) = (W). (5.33) 


(This is not true for the general binomial distribution.) For our current purposes, this means that for the 
ideal classical gas, Eq. (27) is valid for any number of particles. 


5.3. Volume and temperature 


What are the r.m.s. fluctuations of other thermodynamic variables — like V, T, etc.? Again, the 
answer depends on specific conditions. For example, if the volume V occupied by a gas is externally 
fixed (say, by rigid walls), it evidently does not fluctuate at all: 6bV = 0. On the other hand, the volume 
may fluctuate in the situation when the average pressure is fixed — see, e.g., Fig. 1.5. A formal 
calculation of these fluctuations, using the approach applied in the last section, is complicated by the 
fact that it is physically impracticable to fix its conjugate variable, P, i.e. suppress its fluctuations. For 
example, the force At) exerted by an ideal classical gas on a container’s wall (whose measure the 
pressure is) is the result of individual, independent hits of the wall by particles (Fig. 4), with the time 
scale t ~ rp/(v’)'? ~ rg/(T/m)'* ~ 107° s, so that its frequency spectrum extends to very high 
frequencies, virtually impossible to control. 


Fig. 5.4. The force exerted by gas 
particles on a container’s wall, as a 
function of time (schematically). 


However, we can use the following trick, very typical for the theory of fluctuations. It is almost 
evident that the r.m.s. fluctuations of the gas volume are independent of the shape of the container. Let 
us consider a particular situation similar to that shown in Fig. 1.5, with the container of a cylindrical 
shape, with the base area 4.8 Then the coordinate of the piston is just g = V/A, while the average force 
exerted by the gas on the cylinder is “= PA —see Fig. 5. Now if the piston is sufficiently massive, its 
free oscillation frequency @ near the equilibrium position is small enough to satisfy the following three 
conditions. 


First, besides balancing the average force ( 4) and thus sustaining the average pressure (P) = 
(F)/A of the gas, the interaction between the heavy piston and the relatively light particles of the gas is 
weak, because of a relatively short duration of the particle hits (Fig. 4). As a result, the full energy of 
the system may be represented as a sum of those of the particles and the piston, with a quadratic 
contribution to the piston’s potential energy by small deviations from the equilibrium: 


8 As a math reminder, the term “cylinder” does not necessarily mean the “circular cylinder’; the shape of its base 
may be arbitrary; it just should not change with height. 
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U,=-q , where G@=9q-(q)= : (5.34) 


ps | 2 


and «is the effective spring constant arising from the finite compressibility of the gas. 


Fig. 5.5. Deriving Eq. (37). 


Second, at w — 0, this spring constant may be calculated just as for constant variations of the 
volume, with the gas remaining in quasi-equilibrium at all times: 


=) «39 (5.35) 


oq a(V) 


This partial derivative? should be calculated at whatever the given thermal conditions are, e.g., with S = 
const for adiabatic conditions (i.e., a thermally insulated gas), or with 7 = const for isothermal 
conditions (including a good thermal contact between the gas and a heat bath), etc. With that constant 
denoted as_X, Eqs. (34)-(35) give 


~\2 
1 O.P O(P ee 
U.= A (P))\ (¥ = (?) V?. (5.36) 
2 OV)) LA) 24 OV) ), 
Finally, assuming that @ = (x/M)'” is sufficiently small (namely, ha << T) because of a 


sufficiently large piston mass M, we may apply, to the piston’s fluctuations, the classical equipartition 
theorem: (U,) = 7/2, giving!° 


(5.37a) 


Since this result is valid for any A and @, it should not depend on the system’s geometry and 
piston’s mass, provided that it is large in comparison with the effective mass of a single system 
component (say, a gas molecule) — the condition that is naturally fulfilled in most experiments. For the 


° As already was discussed in Sec. 4.1 in the context of the van der Waals equation, for the mechanical stability of 
a gas (or liquid), the derivative OP/OV has to be negative, so that «is positive. 
10 One may meet statements that a similar formula, 


(P?) | -1(- aa (WRONG!) 


is valid for pressure fluctuations. However, a such statement does not take into account a different physical nature 
of pressure (Fig. 4), with its very broad frequency spectrum. This issue will be discussed later in this chapter. 
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particular case of fluctuations at constant temperature (X = T),!! we may use the definition (3.58) of the 
isothermal bulk compressibility Krof the gas to rewrite Eq. (37a) as 


ae = = (5.37b) 


For an ideal classical gas of N particles, with the equation of state (V) = NT/{P), it is easier to use 
directly Eq. (37a), again with X = T, to get 


(V7) < — ma. . ie. Me (5.38) 
r (P) N (VV) oN 


in full agreement with the general trend given by Eq. (12). 


Now let us proceed to fluctuations of temperature, for simplicity focusing on the case V = const. 
Let us again assume that the system we are considering is weakly coupled to a heat bath of temperature 
To, in the sense that the time 7 of temperature equilibration between the two is much larger than the time 
of internal equilibration, called thermalization. Then we may assume that, on the former time scale, T 
changes virtually simultaneously in the whole system, and consider it a function of time alone: 


T =(T)+T(t). (5.39) 


Moreover, due to the (relatively) large z, we may use the stationary relation between small fluctuations 
of temperature and the internal energy of the system: 


T(t)= muy wise”. (5.40) 
C C 


V V 


With those assumptions, Eq. (20) immediately yields the famous expression for the so-called 
thermodynamic fluctuations of temperature: 


(5.41) 


The most straightforward application of this result is to analyses of so-called bolometers — 
broadband detectors of electromagnetic radiation in microwave and infrared frequency bands. (In 
particular, they are used for measurements of the CMB radiation, which was discussed in Sec. 2.6). In 
such a detector (Fig. 6), the incoming radiation is focused on a small sensor (e.g., either a small piece of 
a germanium crystal or a superconductor thin film at temperature 7’ ~ T,, etc.), which is well isolated 
thermally from the environment. As a result, the absorption of an even small radiation power / leads to 
a noticeable change AT of the sensor’s average temperature (7) and hence of its electric resistance R, 
which is probed up by low-noise external electronics.!* If the power does not change in time too fast, AT 
is a certain function of Y, turning to 0 at A= 0. Hence, if AZ is much lower than the environment 
temperature 7», we may keep only the main, linear term in its Taylor expansion in 7 


\1 In this case, we may also use the second of Eqs. (1.39) to rewrite Eq. (37) via the second derivative (6°G/OP’),. 
12 Besides low internal electric noise, a good sensor should have a sufficiently large temperature responsivity 
dR/dT, making the noise contribution by the readout electronics insignificant — see below. 
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AT =(T)-T, =—, (5.42) 


where the coefficient “4 = OA/OT is called the thermal conductance of the (perhaps small but 
unavoidable) thermal coupling between the sensor and the heat bath — see Fig. 6. 


Fig. 5.6. The conceptual scheme of a bolometer. 


to readout electronics 


The power may be detected if the electric signal from the sensor, which results from the change 
AT, is not drowned in spontaneous fluctuations. In practical systems, these fluctuations are contributed 
by several sources including electronic amplifiers. However, in modern systems, these “technical” 
contributions to noise are successfully suppressed,!? and the dominating noise source is the fundamental 
sensor temperature fluctuations, described by Eq. (41). In this case, the so-called noise-equivalent power 
(“NEP”), defined as the level of Y that produces the signal equal to the r.m.s. value of noise, may be 


calculated by equating the expressions (41) (with (7) = To) and (42): 
Toe 


NEP =F | ap_sp ~ Cue’ 
V 


(5.43) 
This expression shows that to decrease the NEP, i.e. improve the detector’s sensitivity, both the 
environment temperature 7) and the thermal conductance ‘4 should be reduced. In modern receivers of 
radiation, their typical values are of the order of 0.1 K and 10'° WK, respectively. 


On the other hand, Eq. (43) implies that to increase the bolometer’s sensitivity, i.e. to reduce the 
NEP, the Cy of the sensor, and hence its mass, should be increased. This conclusion is valid only to a 
certain extent, because due to technical reasons (parameter drifts and the so-called 1/fnoise of the sensor 
and external electronics), the incoming power has to be modulated with as high frequency @ as 
technically possible (in practical receivers, the cyclic frequency v = @/27 of the modulation is between 
10 and 1,000 Hz), so that the electrical signal might be picked up from the sensor at that frequency. As a 
result, the Cy may be increased only until the thermal constant of the sensor, 


; (5.44) 


becomes close to 1/@, because at wr >> 1 the useful signal drops faster than noise. So, the lowest (i.e. 
the best) values of the NEP, 


13 An important modern trend in this progress [see, e.g., P. Day et al., Nature 425, 817 (2003)] is the replacement 
of the resistive temperature sensors R(T) with thin and narrow superconducting strips with temperature-sensitive 
kinetic inductance L,(7) — see the model solution of EM Problem 6.19. Such inductive sensors have zero dec 
resistance, and hence vanishing Johnson-Nyquist noise at typical signal pickup frequencies of a few kHz — see Eq. 
(81) and its discussion below. 
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(NEP),,,, =@N)9'°v'’, — with a~1, (5.45) 


are reached at vr ~ 1. (The exact values of the optimal product @z, and of the numerical constant a ~ | 
in Eq. (45), depend on the exact law of the power modulation in time, and the readout signal processing 
procedure.) With the parameters cited above, this estimate yields (NEP) min/ Vv” ~ 3x1077 W/Hz!” —a 
very low power indeed. 


However, perhaps counter-intuitively, the power modulation allows the bolometric (and other 
broadband) receivers to register radiation with power much lower than this NEP! Indeed, picking up the 
sensor signal at the modulation frequency @, we can use the subsequent electronics stages to filter out 
all the noise besides its components within a very narrow band, of width Av << y, around the 
modulation frequency (Fig. 7). This is the idea of a microwave radiometer,'* currently used in all 
sensitive broadband receivers of radiation. 


input 
power modulation 
leopee Vv 


Av <<v 


noise density 


0 a Fig. 5.7. The basic idea of the Dicke 
pick-up frequency radiometer. 
to output 


In order to analyze this opportunity, we need to develop theoretical tools for a quantitative 
description of the spectral distribution of fluctuations. Another motivation for that description is a need 
for analysis of variables dominated by fast (high-frequency) components, such as pressure — please have 
one more look at Fig. 4. Finally, during such an analysis, we will run into the fundamental relation 
between fluctuations and dissipation, which is one of the main results of statistical physics as a whole. 


5.4. Fluctuations as functions of time 


In the previous sections, the averaging (...) of any function was assumed to be over an 
appropriate statistical ensemble of many similar systems. However, as was discussed in Sec. 2.1, most 
physical systems of interest are ergodic. If such a system is also stationary, 1.e. the statistical averages of 
its variables do not change with time, the averaging may be also understood as that over a sufficiently 
long time interval. In this case, we may think about fluctuations of any variable fas of a random process 


taking place in just one system, but developing in time: ra = f(t) : 


There are two mathematically equivalent approaches to the description of such random functions 
of time, called the time-domain picture and the frequency-domain picture, their relative convenience 


14 Tt was pioneered in the 1950s by Robert Henry Dicke, so that the device is frequently called the Dicke 
radiometer. Note that the optimal strategy of using similar devices for time- and energy-resolved detection of 
single high-energy photons is different — though even it is essentially based on Eq. (41). For a recent brief review 
of such detectors see, e.g., K. Morgan, Phys. Today 71, 29 (Aug. 2018), and references therein. 
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depending on the particular problem to be solved. In the time domain, we need to characterize a random 
fluctuation f(t) by some deterministic function of time. Evidently, the average (f(t)) cannot be used 
for this purpose, because it equals zero — see Eq. (2). Of course, the variance (3) does not equal zero, but 
if the system is stationary, that average cannot depend on time either. Because of that, let us consider the 
following average: 


(FOF). (5.46) 


Generally, this is a function of two arguments. However, in a stationary system, the average like (46) 
may depend only on the difference, 
tT=t'-t, (5.47) 


between the two observation times. In this case, the average (46) is called the correlation function of the 
variable f: 


K (2) =(FOF (+2). (5.48) 


Again, here the averaging may be understood as that either over a statistical ensemble of 
macroscopically similar systems or over a sufficiently long interval of the time argument ¢, with the 
argument rt kept constant. The correlation function’s name!> catches the idea of this notion very well: 
Kt) characterizes the mutual relation between the fluctuations of the variable f at two times separated 
by the given interval z. Let us list the basic properties of this function.!¢ 


First of all, Ky(7) has to be an even function of the time delay t. Indeed, we may write 
K (2) = (FOF E-2)) = (FE-DFO) = FOF +0), (5.49) 


with ¢’ = ¢ — t. For stationary processes, this average cannot depend on the common shift of two 
observation times, so that the averages (48) and (49) have to be equal: 


K,(-t)=K,(@). (5.50) 


Second, at t—> 0 the correlation function tends to the variance: 


K (0) =(FOF@)=(F7) 20. (5.51) 


In the opposite limit, when 7 is much larger than certain characteristic correlation time Tt, of the 
system,!’ the correlation function has to tend to zero because the fluctuations separated by such time 
interval are virtually independent (uncorrelated). As a result, the correlation function typically looks 
like one of the plots sketched in Fig. 8. 


'5 Another term, the autocorrelation function, is sometimes used for the average (48) to distinguish it from the 
mutual correlation function, (f\(t)f(t + 7), of two different stationary processes. 

16 Please notice that this correlation function is the direct temporal analog of the spatial correlation function 
briefly discussed in Sec. 4.2 — see Eq. (4.30). 

'7 Note that the correlation time 7, is the direct temporal analog of the correlation radius r, that was discussed in 
Sec. 4.2 — see the same Eq. (4.30). 
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K,(v) 


Fig. 5.8. The correlation function of 
fluctuations: two typical examples. 


Note that on a time scale much longer than 7, any physically-realistic correlation function may 
be well approximated with a delta function of z. (For example, for a process which is a sum of 
independent very short pulses, e.g., the gas pressure force exerted on the container wall (Fig. 4), such 
approximation is legitimate on time scales much longer than the single pulse duration, e.g., the time of 
particle’s interaction with on the wall at the impact.) 


In the reciprocal, frequency domain, the same process f(t) is represented as a Fourier integral,!® 


f= i ge do, (5.52) 
with the reciprocal transform being 


fo = + fF (edt. (5.53) 


If the function f(t) is random (as it is in the case of fluctuations), with zero average, its Fourier 


transform f, is also a random function (now of frequency), also with a vanishing statistical average. 
Indeed, now thinking of the operation (...) as an ensemble averaging, we may write 


(fe) = (f [Toea = ag S(Feo)ela =0. (5.54) 


The simplest non-zero average may be formed similarly to Eq. (46), but with due respect to the 
complex-variable character of the Fourier images: 


( sae ee 7 oay Jar Je (FOF) BRON SO: (5.55) 


It turns out that for a stationary process, the averages (46) and (55) are directly related. Indeed, 
since the integration over ¢’ in Eq. (55) is in infinite limits, we may replace it with the integration over 7 
=t’-t (at fixed #), also in infinite limits. Replacing ¢’ with ¢ + 7 in the expressions under the integral, 
we see that the average is just the correlation function Kz), while the time exponent is equal to 
exp{i(@’ — o)texp{iow’r}. As a result, changing the order of integration, we get 


ss in ee aoe {. ae nes ae ee 
( i la')= Gay J dt i dtK , (rye? Oe - ny J K (nel dr J cleo (5.56) 


But the last integral is just 270(@— @’),!9 so that we finally get 


18 The argument of the function f,, is represented as its index with a purpose to emphasize that this function is 


different from f(t) , while (very conveniently) still using the same letter for the same variable. 
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Sols) =S,(@)0(@- a’), (5.57) 


where the real function of frequency, 


1 boo . ie Spectral 
S,(o) = — [K, (r)el'dr =—[K,(2)coswrdr, (5.58) aenkel 
20 eee TS fluctuations 
is called the spectral density of fluctuations at frequency ow. According to Eq. (58), the spectral density is 
just the Fourier image of the correlation function, and hence the reciprocal Fourier transform is:2%?! 
+00 Eve % Wiener- 
K,()= } Soe! dw =2 J S,(@)cosat da. (5.59) _Khinchin 


—00 0 


In particular, for the fluctuation variance, Eq. (59) yields 
(7? )=K,(@)= [S,(@)do =2[S,(@)do. (5.60) 
—0 0 


The last relation shows that the term “spectral density” describes the physical sense of the function S{@) 
very well. Indeed, if a random signal f(t) had been passed through a frequency filter with a small 
bandwidth Av << v of positive cyclic frequencies, the integral in the last form of Eq. (60) could be 
limited to the interval Aw = 2 Av, i.e. the variance of the filtered signal would become 


ta = 28 ,(@)Ao = 428 ,(@)Av. (5.61) 


(A popular alternative definition of the spectral density is (Vv) = 472S{@), making the average (61) 
equal to just H( VAv.) 


To conclude this introductory (mostly mathematical) section, let me note an important particular 
case. If the spectral density of some process is nearly constant within all the frequency range of interest, 
S{@) = const = S(0),?* Eq. (59) shows that its correlation function may be well approximated with a 
delta function: 


K,(2)=S, (0) fe!" do = 228 ,(0)d(z). (5.62) 


—00 


From this relation stems another popular name of the white noise, the de/ta-correlated process. We have 
already seen that this is a very reasonable approximation, for example, for the gas pressure force 
fluctuations (Fig. 4). Of course, for the spectral density of a realistic, limited physical variable the 
approximation of constant spectral density cannot be true for al/ frequencies (otherwise, for example, 


19 See, e.g., MA Eq. (14.4). 

20 The second form of Eq. (59) uses the fact that, according to Eq. (58), S(@) is an even function of frequency — 
just as K(7) is an even function of time. 

21 Although Eqs. (58) and (59) look not much more than straightforward corollaries of the Fourier transform, they 
bear a special name of the Wiener-Khinchin theorem — after the mathematicians N. Wiener and A. Khinchin who 
have proved that these relations are valid even for the functions f(7) that are not square-integrable, so that from the 
point of view of standard mathematics, their Fourier transforms are not well defined. 

22 Such process is frequently called the white noise, because it consists of all frequency components with equal 
amplitudes, reminding the white light, which consists of many monochromatic components with close amplitudes. 
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the integral (60) would diverge, giving an unphysical, infinite value of its variance), and may be valid 
only at frequencies much lower than 1/7. 


5.5. Fluctuations and dissipation 


Now we are equipped mathematically to address one of the most important issues of statistical 
physics, the relation between fluctuations and dissipation This relation is especially simple for the 
following hierarchical situation: a relatively “heavy”, slowly moving system, weakly interacting with an 
environment consisting of rapidly moving, “light” components. A popular theoretical term for such a 
system is the Brownian particle, named after botanist Robert Brown who was first to notice (in 1827) 
the random motion of small particles (in his case, pollen grains), caused by their random hits by fluid’s 
molecules, under a microscope. However, the family of such systems is much broader than that of small 
mechanical particles. Just for a few examples, such description is valid for an atom interacting with 
electromagnetic field modes of the surrounding space, a clock pendulum interacting with molecules of 
the air around it, current and voltage in electric circuits, etc.” 


One more important assumption of this theory is that the system’s motion does not violate the 
thermal equilibrium of the environment — well fulfilled in many cases. (Think, for example, about a 
typical mechanical pendulum — its motion does not overheat the air around it to any noticeable extent.) 
In this case, the averaging over a statistical ensemble of similar environments, at a fixed, specific motion 
of the system of interest, may be performed assuming their thermal equilibrium.”4 I will denote such a 
“primary” averaging by the usual angle brackets (...). At a later stage, we may carry out additional, 
“secondary” averaging, over an ensemble of many similar systems of interest, coupled to similar 
environments. When we do, such double averaging will be denoted by double angle brackets ((...)). 


Let me start from a simple classical system, a 1D harmonic oscillator whose equation of 
evolution may be represented as 


mij + Ky = Fig (t) + Fo, (1) = Fg (Q+(F)+F (0, — with (F (0) =0, (5.63) 


env 


where g is the (generalized) coordinate of the oscillator, Aje(¢) is the deterministic external force, while 
both components of the force Zn (ft) represent the impact of the environment on the oscillator’s motion. 
Again, on the time scale of the fast-moving environmental components, the oscillator’s motion is slow. 
The average component (+) of the force exerted by the environment on such a slowly moving object is 
frequently independent of its coordinate g but does depend on its velocity g . For most such systems, the 
Taylor expansion of the force in small velocity has a non-zero linear term: 


(F)=-04, (5.64) 


where the constant 7 is usually called the drag (or “kinematic friction”, or “damping”’) coefficient, so 
that Eq. (63) may be rewritten as 


23 To emphasize this generality, in the forthcoming discussion of the 1D case, I will use the letter g rather than x 
for the system’s displacement. 

24 For a usual (ergodic) environment, the primary averaging may be interpreted as that over relatively short time 
intervals, zt, << At << 12, where 7 is the correlation time of the environment, while z is the characteristic time 
scale of motion of our “heavy” system of interest. 
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(5.65) 


This method of describing the environmental effects on an otherwise Hamiltonian system is 
called the Langevin equation.*> Due to the linearity of the differential equation (65), its general solution 
may be represented as a sum of two independent parts: the deterministic motion of the damped linear 
oscillator due to the external force Aze(t), and its random fluctuations due to the random force F (t) 
exerted by the environment. The former effects are well known from classical dynamics,?° so let us 
focus on the latter part by taking “j(f) = 0. The remaining term on the right-hand side of Eq. (65) 
describes the fluctuating part of the environmental force; in contrast to the average component (64), its 
intensity (read: its spectral density at relevant frequencies @~ @ = (x/m)'”) does not vanish at q(t) = 0, 
and hence may be evaluated ignoring the system’s motion.27 


Plugging into Eq. (65) the representation of both variables in the Fourier form similar to Eq. 
(52), and requiring the coefficients before the same exp {-i@t} to be equal on both sides of the equation, 
for their Fourier images we get the following relation: 


~m@'q,, -iON 9, +My =Fp (5.66) 


oO 


which immediately gives us qq, i.e. the (random) complex amplitude of the coordinate fluctuations: 


Gs 2 a (5.67) 
(x-mo’)-ino m(a, —-@ )-ino 
Now multiplying Eq. (67) by its complex conjugate for another frequency (say, @’), averaging both 
parts of the resulting equation, and using the formulas similar to Eq. (57) for each of them,?8 we get the 
following relation between spectral densities of the oscillations and the random force: 79 


1 
m(@,-@°) +7’°o 


S,(@) = -S,(@). (5.68) 


In the so-called low-damping limit (77 << m@p), the fraction on the right-hand side of Eq. (68) 
has a sharp peak near the oscillator’s own frequency @ (describing the well-known effect of high-Q 
resonance), and may be well approximated in that vicinity as 


25 Named after Paul Langevin, whose 1908 work was the first systematic development of A. Einstein’s ideas on 
Brownian motion (see below) using this formalism. A detailed discussion of this approach, with numerical 
examples of its application, may be found, e.g., in the monograph by W. Coffey, Yu. Kalmykov, and J. Waldron, 
The Langevin Equation, World Scientific, 1996. 

26 See, e.g., CM Sec. 5.1. Here I assume that the variable f{(4) is classical, with the discussion of the quantum case 
postponed until the end of the section. 

27 Note that the direct secondary statistical averaging of Eq. (65) with Zar = 0 yields ((g)) = 0! This, perhaps a bit 
counter-intuitive result becomes less puzzling if we recognize that this is the averaging over a large statistical 
ensemble of random sinusoidal oscillations with all values of their phase, and that the (equally probable) 
oscillations with opposite phases give mutually canceling contributions to the sum in Eq. (2.6). 

28 At this stage, we restrict our analysis to random, stationary processes q(t), so that Eq. (57) is valid for this 
variable as well, if the averaging in it is understood in the ((...)) sense. 

29 Regardless of the physical sense of such a function of @, and of whether its maximum is situated at a finite 
frequency q@ as in Eq. (68) or at w= 0, it is often referred to as the Lorentzian (or “Breit-Wigner’”) line. 
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1 1 


2m(o- @) 
m’ (a2 —a°) + (nay Ware +1)’ 


whe 


(5.69) 


In contrast, the spectral density S;z(@) of fluctuations of a typical environment is changing relatively 
slowly, so that for the purpose of integration over frequencies near @ we may replace S;(@) with Sz 


(@). As a result, the variance of the environment-imposed random oscillations may be calculated, using 
Eq. (60), as° 


_ foe} 1 n +00 dé 

*)\=2|S (@)dox2 |S,(@)do=2S8;(a 5.70 
((7’)) [s,(@) olde! ) a0) oF om FE (5.70) 
This is a well-known table integral,3! equal to z, so that, finally: 


” 1 1 1 
((77)) =28,-(@) sha = _5,(0) = 25,5 (0). (5.71) 
no, 2m mon KN 


But on the other hand, the weak interaction with the environment should keep the oscillator in 
thermodynamic equilibrium at the same temperature 7. Since our analysis has been based on the 
classical Langevin equation (65), we may only use it in the classical limit Aa@ << T, in which we may 
use the equipartition theorem (2.48). In our current notation, it yields 


K fj mo\\ _ ‘y 
5 (4 ))= a (5.72) 
Comparing Eqs. (71) and (72), we see that the spectral density of the random force exerted by the 
environment has to be fundamentally related to the damping it provides: 


S3(@,)=—T. (5.73a) 
1 
Now we may argue (rather convincingly :-) that since this relation does not depend on oscillator’s 
parameters m and x, and hence its eigenfrequency @ = («/m)'”, it should be valid at any relatively low 
frequency (@z, << 1). Using Eq. (58) with @— 0, it may be also rewritten as a formula for the effective 
low-frequency drag coefficient: 


(5.73b) 


Formulas (73) reveal an intimate, fundamental relation between the fluctuations and the 
dissipation provided by a thermally-equilibrium environment. Parroting the famous political slogan, 
there is “no dissipation without fluctuation” — and vice versa. This means in particular that the 
phenomenological description of dissipation barely by the drag force in classical mechanics?? is 


30 Since in this case the process in the oscillator is entirely due to its environment, its variance should be obtained 
by statistical averaging over an ensemble of many similar (oscillator + environment) systems, and hence, 
following our convention, it is denoted by double angular brackets. 

31 See, e.g. MA Eq. (6.5a). 

32 See, e.g., CM Sec. 5.1. 
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(approximately) valid only when the energy scale of the process is much larger than 7. To the best of my 
knowledge, this fact was first recognized in 1905 by A. Einstein,? for the following particular case. 


Let us apply our result (73) to a free 1D Brownian particle, by taking « = 0 and F(t) = 0. In 
this case, both relations (71) and (72) give infinities. To understand the reason for that divergence, let us 
go back to the Langevin equation (65) with not only «= 0 and Age(t)= 0, but also m — 0 — just for the 
sake of simplicity. (The latter approximation, frequently called the overdamping limit, is quite 
appropriate, for example, for the motion of small particles in viscous fluids — such as in R. Brown’s 
experiments.) In this approximation, Eq. (65) is reduced to a simple equation, 


nq=F(t), with (F (0) =0, (5.74) 


which may be readily integrated to give the particle’s displacement during a finite time interval f: 
Wee a, 
Ag() = 9()- q(0)=— | Dat (5.75) 
0 


Evidently, at the full statistical averaging of the displacement, the fluctuation effects vanish, but 
this does not mean that the particle does not move — just that it has equal probabilities to be shifted in 
either of two possible directions. To see that, let us calculate the variance of the displacement: 


t 


((az*@))- Ja rf ane (FoF) = 7 fat dt'| dt"K ,(t'-1"). (5.76) 


0 


As we already know, at times t>> %, the correlation function may be well approximated by the delta 
function — see Eq. (62). In this approximation, with S,({0) expressed by Eq. (73a), we get 


(Ag *(9)) == 7 ZlO)farfarrser—9 t) =e ar =" t= 2Dr (5.77) 


with 
D _f (5.78) Einstein’s 
n relation 
The final form of Eq. (77) describes the well-known law of diffusion (“random walk’) of a 1D 
system, with the r.m.s. deviation from the point of origin growing as (2Df)'”. The coefficient D is this 
relation is called the coefficient of diffusion, and Eq. (78) describes the extremely simple and important3+ 
Einstein’s relation between that coefficient and the drag coefficient. Often this relation is rewritten, in 
the SI units of temperature, as D = wkpTx, where = 1/7 is the mobility of the particle. The physical 
sense of 4 becomes clear from the expression for the deterministic velocity (particle’s “drift”), which 
follows from the averaging of both sides of Eq. (74) after the restoration of the term Zge(f) in it: 


33 It was published in one of the three papers of Einstein’s celebrated 1905 “triad”. As a reminder, another paper 
started the (special) relativity theory, and one more was the quantum description of the photoelectric effect, 
essentially starting the quantum mechanics. Not too bad for one year, one young scientist! 

34 In particular, in 1908, i.e. very soon after Einstein’s publication, it was used by J. Perrin for an accurate 
determination of the Avogadro number N4. (It was Perrin who graciously suggested naming this constant after A. 
Avogadro, honoring his pioneering studies of gases in the 1810s.) 
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Vaan = ((q(8))) = Fe () = uF, (0), (5.79) 


so that the mobility is just the drift velocity given to the particle by a unit force.?> 


Another famous embodiment of the general Eq. (73) is the thermal (or “Johnson”, or “Johnson- 
Nyquist”, or just “Nyquist’”) noise in resistive electron devices. Let us consider a two-terminal, 
dissipation-free “probe” circuit, playing the role of the harmonic oscillator in our analysis carried out 
above, connected to a resistive device (Fig. 9), playing the role of the probe circuit’s environment. (The 
noise is generated by the thermal motion of numerous electrons, randomly moving inside the resistive 
device.) For this system, one convenient choice of the conjugate variables (the generalized coordinate 
and generalized force) is, respectively, the electric charge O = [1(t)dt that has passed through the “probe” 
circuit by time f, and the voltage Y across its terminals, with the polarity shown in Fig. 9. (Indeed, the 
product /dQ is the elementary work dW done by the environment on the probe circuit.) 


os + 
it 
ore ay ¥ Serr , ee AE ae 
it ° Fig. 5.9. A resistive device as a dissipative 
tewles environment of a two-terminal probe circuit. 


Making the corresponding replacements, q > QO and 4— VY in Eq. (64), we see that it becomes 


(V)=-nQ=-n. (5.80) 
Comparing this relation with Ohm’s law, “= R(-J),>° we see that in this case, the coefficient 77 has the 
physical sense of the usual Ohmic resistance R of our dissipative device,’ so that Eq. (73a) becomes 


S, ()="7. (5.8 1a) 


Using last equality in Eq. (61), and transferring to the SI units of temperature (T = kg7Tx), we may bring 
this famous Nyquist formula>® to its most popular form: 


Nyquist oes 
irks V ) = 4k,T. RAV . (5.81b) 
Vv 


35 Note that in solid-state physics and electronics, the charge carrier mobility is usually defined as | Vain/é | = 
e€Varit/| Fact! = e| | (where #& is the applied electric field), and is traditionally measured in cm?/V:s. 

36 The minus sign is due to the fact that in our notation, the current flowing in the resistor, from the positive 
terminal to the negative one, is (-/) — see Fig. 9. 

37 Due to this fact, Eq. (64) is often called the Ohmic model of the environment’s response, even if the physical 
nature of the variables g and “is completely different from the electric charge and voltage. 

38 Tt is named after Harry Nyquist who derived this formula in 1928 (independently of the prior work by A. 
Einstein, M. Smoluchowski, and P. Langevin) to describe the noise that had been just discovered experimentally 
by his Bell Labs’ colleague John Bertrand Johnson. The derivation of Eq. (73) and hence Eq. (81) in these notes is 
essentially a twist of the derivation used by H. Nyquist. 
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Note that according to Eq. (65), this result is only valid at a negligible speed of change of the coordinate 
q (in our current case, negligible current J), i.e. Eq. (81) expresses the voltage fluctuations as would be 
measured by a virtually ideal voltmeter, with its input resistance much higher than R. 


On the other hand, using a different choice of generalized coordinate and force, g > ®, J> I 
(where ® =|V%f)dt is the generalized magnetic flux, so that dW = IV(t)dt = IdD), we get 7 —> 1/R, and 
Eq. (73) yields the thermal fluctuations of the current through the resistive device, as measured by a 
virtually ideal ammeter, i.e. at V — 0: 

S,(o) = <1, i.e. ae = Ak ay, (5.81c) 

The nature of Eqs. (81) is so fundamental that they may be used, in particular, for the so-called 
Johnson noise thermometry? Note, however, that these relations are valid for noise in thermal 
equilibrium only. In electric circuits that may be readily driven out of equilibrium by an applied voltage 
¥, other types of noise are frequently important, notably the shot noise, which arises in short 
conductors, e.g., tunnel junctions, at applied voltages with |%| >> 7 /q, due to the discreteness of charge 
carriers.4° A straightforward analysis (left for the reader’s exercise) shows that this noise may be 
characterized by current fluctuations with the following low-frequency spectral density: 


(5.82) 


where gq is the electric charge of a single current carrier. This is the Schottky formula,*! valid for any 
relation between the average J and “% The comparison of Eqs. (81c) and (82) for a device that obeys the 
Ohm law shows that the shot noise has the same intensity as the thermal noise with the effective 
temperature 


——>>T. (5.83) 


This relation may be interpreted as a result of charge carrier overheating by the applied electric field, 
and explains why the Schottky formula (82) is only valid in conductors much shorter than the energy 
relaxation length /, of the charge carriers.42 (Another mechanism of shot noise suppression, which may 
become noticeable in highly conductive nanoscale devices, is the Fermi-Dirac statistics of electrons.**) 


Now let us return for a minute to the bolometric Dicke radiometer (see Figs. 6-7 and their 
discussion in Sec. 4), and use the Langevin formalism to finalize its analysis. For this system, the 
Langevin equation is an extension of the usual equation of heat balance: 


39 See, e.g., J. Crossno et al., Appl. Phys. Lett. 106, 023121 (2015), and references therein. 

40 Another practically important type of fluctuations in electronic devices is the low-frequency 1/fnoise that was 
already mentioned in Sec. 3 above. I will briefly discuss it in Sec. 8. 

41 It was derived by Walter Hans Schottky as early as 1918, i.e. even before Nyquist’s work. 

42 See, e.g., Y. Naveh et al., Phys. Rev. B 58, 15371 (1998). In practically used metals, /, is of the order of 30 nm 
even at liquid-helium temperatures (and much shorter at room temperatures), so that the usual “macroscopic” 
resistors do not exhibit the shot noise. 

43 For a review of this effect see, e.g., Ya. Blanter and M. Biittiker, Phys. Repts. 336, 1 (2000). 
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C, TA WT-T)= Rt FO, (5.84) 


where Suet = (SP) describes the (deterministic) power of the absorbed radiation and P represents the 
effective source of temperature fluctuations. Now we can use Eq. (84) to carry out a calculation of the 
spectral density S7(@) of temperature fluctuations absolutely similarly to how this was done with Eq. 
(65), assuming that the frequency spectrum of the fluctuation source is much broader than the intrinsic 
bandwidth 1/t = ‘4/Cy of the bolometer, so that its spectral density at frequencies wr ~ 1 may be well 
approximated by its low-frequency value S.A0): 


S,(0). (5.85) 


s,(0)-| 


—i@Cy +9 


Then, requiring the variance of temperature fluctuations, calculated from this formula and Eq. (60), 
2 


2 [R2\ _ f - t 
(ory =(F?)=2 J S,(a)do = 28, (0)f ae do 
(5.86) 
lf da 7S (0 
= 28 ,(0)— | ——_, = = ’ 
C; 1 a +(9/C, ) GC 
to coincide with our earlier “thermodynamic fluctuation” result (41), we get 
G 3 

a A ees ; (5.87) 


The r.m.s. value of the “power noise” within a bandwidth Av << 1/r (see Fig. 7) becomes equal to the 
deterministic signal power Aue: (or more exactly, the main harmonic of its modulation law) at 
P=P,.= (7 Ie =(25,,(0)A@)'"? =2(9Av)'°T,. (5.88) 

This result shows that our earlier prediction (45) may be improved by a substantial factor of the 
order of (Av/v)'”, where the reduction of the output bandwidth is limited only by the signal 
accumulation time At ~ 1/Av, while the increase of v is limited by the speed of (typically, mechanical) 
devices performing the power modulation. In practical systems this factor may improve the sensitivity 
by a couple of orders of magnitude, enabling observation of extremely weak radiation. Maybe the most 
spectacular example is the recent measurements of the CMB radiation, which corresponds to blackbody 
temperature Tx ~ 2.726 K, with accuracy 67x ~ 10° K, using microwave receivers with the physical 
temperature of all their components much higher than dT. The observed weak (~10° K) anisotropy of 
the CMB radiation is a major experimental basis of all modern cosmology.*4 


Returning to the discussion of our main result, Eq. (73), let me note that it may be readily 
generalized to the case when the environment’s response is different from the Ohmic form (64). This 
opportunity is virtually evident from Eq. (66): by its derivation, the second term on its left-hand side is 
just the Fourier component of the average response of the environment to the system’s displacement: 


44 See, e.g., a concise book by A. Balbi, The Music of the Big Bang, Springer, 2008. 
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(F,) =ionq,.- (5.89) 
Now let the response be still linear, but have an arbitrary frequency dispersion, 
(F,) = Mo), > (5.90) 


where the function 7(@), called the generalized susceptibility (in our case, of the environment) may be 
complex, i.e. have both the imaginary and real parts: 


MO) = X(@) + iZ"@). (5.91) 


Then Eq. (73) remains valid with the replacement 77 > y’’(@)/a@: * 


5, (0-27 (5.92) 
a) 

This fundamental relation*® may be used not only to calculate the fluctuation intensity from the 
known generalized responsibility (i.e. the deterministic response of the system to a small perturbation), 
but also in reverse — to calculate such linear response from the known fluctuations. The latter use is 
especially attractive at numerical simulations of complex systems, e.g., those based on molecular- 
dynamics approaches, because it circumvents the need in extracting a weak response to a small 
perturbation out of a noisy background. 


Now let us discuss what generalization of Eq. (92) is necessary to make that fundamental result 
suitable for arbitrary temperatures, T ~ ha. The calculations we had performed were based on the 
apparently classical equation of motion, Eq. (63). However, quantum mechanics shows%’ that a similar 
equation is valid for the corresponding Heisenberg-picture operators, so that repeating all the arguments 
leading to the Langevin equation (65), we may write its quantum-mechanical version 


Heisenberg- 
(5.93) Langevin 


equation 


This is the so-called Heisenberg-Langevin (or “quantum Langevin”) equation — in this particular case, 
for a harmonic oscillator. 


The further operations, however, require certain caution, because the right-hand side of the 
equation is now an operator, and has some nontrivial properties. For example, the “values” of the 
Heisenberg operator, representing the same variable f(t) at different times, do not necessarily commute: 


FoOFOe fOfO, if C#t. (5.94) 


45 Reviewing the calculations leading to Eq. (73), we may see that the possible real part y’(@) of the susceptibility 
just adds up to (k — ma@) in the denominator of Eq. (67), resulting in a change of the oscillator’s frequency @p. 
This renormalization is insignificant if the oscillator-to-environment coupling is weak, i.e. if the susceptibility 
7(@) is small — as had been assumed at the derivation of Eq. (69) and hence Eq. (73). 

46 Tt is sometimes called the Green-Kubo (or just the Kubo) formula. This is hardly fair, because, as the reader 
could see, Eq. (92) is just an elementary generalization of the Nyquist formula (81). Moreover, the corresponding 
works of M. Green and R. Kubo were published, respectively, in 1954 and 1957, i.e. after the 1951 paper by H. 
Callen and T. Welton, where a more general result (98) had been derived. Much more adequately, the 
Green/Kubo names are associated with Eq. (102) below. 

47 See, e.g., QM Sec. 4.6. 
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As a result, the function defined by Eq. (46) may not be a symmetric function of the time delay r =t’—-t¢ 
even for a stationary process, making it inadequate for the representation of the actual correlation 
function — which has to obey Eq. (50). This technical difficulty may be overcome by the introduction of 
the following symmetrized correlation function*® 


K,(t)= sore +r)+ f+ ni) 2 {fore + a) . (5.95) 


(where {...,...} denotes the anticommutator of the two operators), and, similarly, the symmetrical 
spectral density S(@), defined by the following relation: 


$,(od(o-0') => (foley +hato)=sVoho)). (5.96) 


with Kz) and S(@) still related by the Fourier transform (59). 


Now we may repeat all the analysis that was carried out for the classical case, and get Eq. (71) 
again, but now this expression has to be compared not with the equipartition theorem, but with its 
quantum-mechanical generalization (14), which, in our current notation, reads 


ho, ho, 
~2 = 0 0 
(9 }) = oath. (5.97) 
As aresult, we get the following quantum-mechanical generalization of Eq. (92): 
hy"(o) ho 
S 7 (@) =—— coth —. 5.98 
oO) 7 aT — 


This is the much-celebrated fluctuation-dissipation theorem, usually referred to just as the FDT, first 
derived in 1951 by Herbert Bernard Callen and Theodore A. Welton — in a somewhat different way. 


As natural as it seems, this generalization of the relation between fluctuations and dissipation 
poses a very interesting conceptual dilemma. Let, for the sake of clarity, temperature be relatively low, T 
<< ha, then Eq. (98) gives a temperature-independent result 


S7(@)= rato) (5.99) 
iF 
which describes what is frequently called guantum noise. According to the quantum Langevin equation 
(93), nothing but the random force exerted by the environment, with the spectral density (99) 
proportional to the imaginary part of susceptibility (i.e. damping), is the source of the ground-state 
“fluctuations” of the coordinate and momentum of a quantum harmonic oscillator, with the r.m.s. values 


&=((@)) = ( . a(R) = Ea (5.100) 


48 Here (and to the end of this section) the averaging (...) should be understood in the general quantum-statistical 
sense — see Eq. (2.12). As was discussed in Sec. 2.1, for the classical-mixture state of the system, this does not 
create any difference in either the mathematical treatment of the averages or their physical interpretation. 
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and the total energy fi@/2. On the other hand, the basic quantum mechanics tells us that exactly these 
formulas describe the ground state of a dissipation-free oscillator, not coupled to any environment, and 
are a direct corollary of the basic commutation relation 


[g,p]=in. (5.101) 
So, what is the genuine source of the uncertainty described by Eqs. (100)? 


The best resolution of this paradox I can offer is that either interpretation of Eqs. (100) is 
legitimate, with their relative convenience depending on the particular application. One may say that 
since the right-hand side of the quantum Langevin equation (93) is a quantum-mechanical operator, 
rather than a classical force, it “carries the uncertainty relation within itself’. However, this (admittedly, 
opportunistic :-) resolution leaves the following question open: is the quantum noise (99) of the 
environment’s observable ¥ directly, without any probe oscillator subjected to it? An experimental 
resolution of this dilemma is not quite simple, because usual scientific instruments have their own 
ground-state uncertainty, i.e. their own quantum fluctuations, which may be readily confused with those 
of the system under study. Fortunately, this difficulty may be overcome, for example, using unique 
frequency-mixing (“down-conversion’’) properties of Josephson junctions. Special low-temperature 
experiments using such down-conversion*? have confirmed that the noise (99) is real and measurable. 


Finally, let me mention an alternative derivation®® of the fluctuation-theorem (98) from the 
general quantum mechanics of open systems. This derivation is substantially longer than that presented 
above, but gives an interesting sub-product, the Green-Kubo formula 


(a (t), F (t+ }) = ih'G(cr), (5.102) 
where 7) is the temporal Green’s function of the environment, defined by the following relation: 
(F(t) = | Gr)q(t—t)dt = } Gt —t')q(t')dt' . (5.103) 
0 —-0 


Plugging the Fourier transforms of all three functions of time participating in Eq. (103) into this relation, 
it is straightforward to check that this Green’s function is just the Fourier image of the complex 
susceptibility 7(@) defined by Eq. (90): 
[9@e!'dr = xo); (5.104) 
0 
here 0 is used as the lower limit instead of (cc) just to emphasize that due to the causality principle, 
Green’s function has to be equal zero for zt < 0.5! 


In order to reveal the real beauty of Eq. (102), we may use the Wiener-Khinchin theorem (59) to 
rewrite the fluctuation-dissipation theorem (98) in a similar time-domain form: 


49 R. Koch et al., Phys. Rev. B 26, 74 (1982), and references therein. 
50 See, e.g., QM Sec. 7.4. 
5! See, e.g., CM Sec. 5.1. 
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( F(),F (t+1) \ =2K 5(r), (5.105) 


where the symmetrized correlation function K 7) is most simply described by its Fourier transform, 
which is, according to Eq. (58), equal to zS\{@), so that using the FDT, we get 


[K;(@cosardr = ALN) cory 22 (5.106) 
J 2 oT 


The comparison of Eqs. (102) and (104), on one hand, and Eqs (105)-(106), on the other hand, 
shows that both the commutation and anticommutation properties of the Heisenberg-Langevin force 
operator at different moments of time are determined by the same generalized susceptibility 7(@) of the 
environment. However, the averaged anticommutator also depends on temperature, while the averaged 
commutator does not — at least explicitly, because the complex susceptibility of an environment may be 
temperature-dependent as well. 


5.6. The Kramers problem and the Smoluchowski equation 


Returning to the classical case, it is evident that Langevin equations of the type (65) provide 
means not only for the analysis of stationary fluctuations, but also for the description of arbitrary time 
evolution of (classical) systems coupled to their environments — which, again, may provide both 
dissipation and fluctuations. However, this approach to evolution analysis suffers from two major 
handicaps. 


First, the Langevin equation does enable a straightforward calculation of the statistical average 
of the variable g, and its fluctuation variance — 1.e., in the common mathematical terminology, the first 
and second moments of the probability density w(q, ¢) — as functions of time, but not of the probability 
distribution as such. Admittedly, this is rarely a big problem, because in most cases the distribution is 
Gaussian — see, e.g., Eq. (2.77). 


The second, more painful drawback of the Langevin approach is that it is instrumental only for 
“linear” systems — i.e., the systems whose dynamics may be described by linear differential equations, 
such as Eq. (65). However, as we know from classical dynamics, many important problems (for 
example, the Kepler problem of planetary motion>®?) are reduced to motion in substantially non- 
harmonic potentials Uedq), leading to nonlinear equations of motion. If the energy of interaction 
between the system and its random environment is factorable — i.e. is a product of variables belonging to 
these subsystems (as it is very frequently the case), we may repeat all arguments of the last section to 
derive the following generalized version of the 1D Langevin equation: *? 


mij +ng+ 9 - Fo, (5.107) 
q 


52 See, e.g., CM Secs. 3.4-3.6. 

53 The generalization of Eq. (107) to higher spatial dimensionality is also straightforward, with the scalar variable 
q replaced with a multi-dimensional vector q, and the scalar derivative dU/dq replaced with the vector VU, where 
V is the del vector-operator in the q-space. 
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valid for an arbitrary, possibly time-dependent potential U(g, #). Unfortunately, the solution of this 
equation may be very hard. Indeed, its Fourier analysis carried out in the last section was essentially 
based on the linear superposition principle, which is invalid for nonlinear equations. 


If the fluctuation intensity is low, | 67 | << (q), where (q)(t) is the deterministic solution of Eq. 
(107) in the absence of fluctuations, this equation may be linearized** with respect to small fluctuations 
q=q- (q) to get a linear equation, 
2 


mgt+ng+KOG=F(), — with K(d) = <2 U(ayou) (5.108) 


This equation differs from Eq. (65) only by the time dependence of the effective spring constant x(?), 
and may be solved by the Fourier expansion of both the fluctuations and the function x(t). Such 
calculations may be more cumbersome than those performed above, but still be doable (especially if the 
unperturbed motion (g)(¢) is periodic), and sometimes give useful analytical results.>5 


However, some important problems cannot be solved by linearization. Perhaps, the most 
apparent (and practically very important) example is the so-called Kramers problem>® of finding the 
lifetime of a metastable state of a 1D classical system in a potential well separated from the region of 
unlimited motion with a potential barrier — see Fig. 10. 


Fig. 5.10. The Kramers problem. 


In the absence of fluctuations, the system, initially placed close to the well’s bottom (in Fig. 10, 
at gq = qi), would stay there forever. Fluctuations result not only in a finite spread of the probability 
density w(q, ¢) around that point but also in a gradual decrease of the total probability 


W(t)= } w(q,t)dq (5.109) 
well's 
bottom 


to find the system in the well, because of a non-zero rate of its escape from it, over the potential barrier, 
due to thermal activation. What may be immediately expected of the situation is that if the barrier 
height, 

U, =U(q,)-U(q), (5.110) 


54 See, e.g., CM Secs. 3.2, 5.2, and beyond. 

55 See, e.g., QM Problem 7.8, and also Chapters 5 and 6 in the monograph by W. Coffey et al., cited above. 

56 It was named after Hendrik Anthony (“Hans”) Kramers who, besides solving this conceptually important 
problem in 1940, has made several other seminal contributions to physics, including the famous Kramers-Kronig 
dispersion relations (see, e.g., EM Sec. 7.4) and the WKB (Wentzel-Kramers-Brillouin) approximation in 
quantum mechanics — see, e.g., QM Sec. 2.4. 
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is much larger than temperature 7,°’ the Boltzmann distribution w x exp{-U(q)/T} should be still 
approximately valid in most of the well, so that the probability for the system to overcome the barrier 
should scale as exp{-U/T}. From these handwaving arguments, one may reasonably expect that if the 
probability W(t) of the system’s still residing in the well by time ¢ obeys the usual “decay law” 

W 


eae (5.11 1a) 
T 


then the lifetime 7 has to obey the general Arrhenius law: 


T=T, exp Se), (5.111b) 


However, these relations need to be proved, and the pre-exponential coefficient zt, (usually called the 
attempt time) needs to be calculated. This cannot be done by the linearization of Eq. (107), because this 
approximation is equivalent to a quadratic approximation of the potential U(g), which evidently cannot 
describe the potential well and the potential barrier simultaneously — see Fig. 10 again. 


This and other essentially nonlinear problems may be addressed using an alternative approach to 
fluctuations, dealing directly with the time evolution of the probability density w(g, ft). Due to the 
shortage of time/space, I will review this approach using mostly handwaving arguments, and refer the 
interested reader to special literature*’ for strict mathematical proofs. Let us start from the diffusion of a 
free classical 1D particle with inertial effects negligible in comparison with damping. It is described by 
the Langevin equation (74) with Aj = 0. Let us assume that at all times the probability distribution 
stays Gaussian: 


1 (g-q0) 
= , 5.112 
mee ayaa ae ee 


where qo is the initial position of the particle, and dg(Z) is the time-dependent distribution width, whose 
growth in time is described, as we already know, by Eq. (77): 


g(t) = (2Dt)'”. (5.113) 


Then it is straightforward to verify, by substitution, that this solution satisfies the following simple 
partial differential equation,>? 


2 
ai 5 ia (5.114) 
Ot Oq 
with the delta-functional initial condition 
w(q,0) = 0(q -4)- (5.115) 


57 If Up is comparable with 7, the system’s behavior also depends substantially on the initial probability 
distribution, i.e., does not follow the simple law (111). 

58 See, e.g., either R. Stratonovich, Topics in the Theory of Random Noise, vol. 1., Gordon and Breach, 1963, or 
Chapter | in the monograph by W. Coffey et al., cited above. 

59 By the way, the goal of the traditional definition (78) of the diffusion coefficient, leading to the front 
coefficient 2 in Eq. (77), is exactly to have the fundamental equations (114) and (116) free of numerical 
coefficients. 
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The simple and important equation of diffusion (114) may be naturally generalized to the 3D motion:° 


(5.116) 


—+V-j, =0, (5.117a) 


where the vector j,, has the physical sense of the probability current density. (The validity of this 
relation is evident from its integral form, 


d 3 ° 2 
—|wdr+oj,-d°q=0, 5.117b 
a! pi q ( ) 


which results from the integration of Eq. (117a) over an arbitrary time-independent volume V limited by 
surface S, and applying the divergence theorem®™ to the second term.) The continuity relation (117a) 
coincides with Eq. (116), with D given by Eq. (78), only if we take 


ji 2 Dye Vy), (5.118) 
7] 

The first form of this relation allows a simple interpretation: the probability flow is proportional 
to the spatial gradient of the probability density (1.e., in application to N >> 1 similar and independent 
particles, just to the gradient of their concentration n = Nw), with the sign corresponding to the flow 
from the higher to lower concentrations. This flow is the very essence of the effect of diffusion. The 
second form of Eq. (118) is also not very surprising: the diffusion speed scales as temperature and is 
inversely proportional to the viscous drag. 


The fundamental Eq. (117) has to be satisfied also in the case of a force-driven particle at 
negligible diffusion (D > 0); in this case 
j, =Vv, (5.119) 


where v is the deterministic velocity of the particle. In the high-damping limit we are considering right 
now, V has to be just the drift velocity: 


ve=Gs--yU@, (5.120) 
1 1 


where Fact is the deterministic force described by the potential energy U(q). 


Now that we have descriptions of j,, due to both the drift and the diffusion separately, we may 
rationally assume that in the general case when both effects are present, the corresponding components 
(118) and (119) of the probability current just add up, so that 


60 As will be discussed in Chapter 6, the equation of diffusion also describes several other physical phenomena — 
in particular, the heat propagation in a uniform, isotropic solid, and in this context is called the heat conduction 
equation or (rather inappropriately) just the “heat equation”. 

6! Both forms of Eq. (117) are similar to the mass conservation law in classical dynamics (see, e.g., CM Sec. 8.2), 
the electric charge conservation law in electrodynamics (see, e.g., EM Sec. 4.1), and the probability conservation 
law in quantum mechanics (see, e.g., QM Sec. 1.4). 

62 See, e.g., MA Eq. (12.2), 
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j, =~[o(-vu)-Tvw), (5.121) 
1 


and Eq. (117a) takes the form 


n= v(wvu)+ TVW, (5.122) 


This is the Smoluchowski equation,® which is closely related to the drift-diffusion equation in multi- 
particle kinetics — to be discussed in the next chapter. 


As a sanity check, let us see what the Smoluchowski equation gives in the stationary limit, Ow/Ot 
— 0 (which evidently may be eventually achieved only if the deterministic potential U is time- 
independent.) Then Eq. (117a) yields j,, = const, where the constant describes the deterministic motion 
of the system as the whole. If such a motion is absent, j,, = 0, then according to Eq. (121), 


wVU +TVw =0, 3, (5.123) 


Since the left-hand side of the last relation is just V(Inw), it may be easily integrated over q, giving 
Inw=-—+InC, Le. wr) = Cexp| - 2} (5.124) 


where C is a normalization constant. With both sides multiplied by the number WN of similar, independent 
systems, with the spatial density n(q) = Nw(q), this equality becomes the Boltzmann distribution (3.26). 


As a less trivial example of the Smoluchowski equation’s applications, let us use it to solve the 
1D Kramers problem (Fig. 10) in the corresponding high-damping limit, m << 77,4, where Za (still to be 
calculated) is some time scale of the particle’s motion inside the well. It is straightforward to verify that 
the 1D version of Eq. (121), 


f= “of =) r&) (5.125a) 
1 oq oq 
(where J, is the probability current at a certain point g, rather than its density) is mathematically 
equivalent to 
T 
_— exp “ol 0 wenp] | , (5.125b) 
n T J oq T 
so that we may write 
i, exp] 10 | 2 wexp| 1D} (5.126) 
T n Og T 


As was discussed above, the notion of metastable state’s lifetime is well defined only for sufficiently 
low temperatures 


63 Tt is named after Marian Smoluchowski, who developed this formalism in 1906, apparently independently from 
the slightly earlier Einstein’s work, but in much more detail. This equation has important applications in many 
fields of science — including such surprising topics as statistics of spikes in neural networks. (Note, however, that 
in some non-physical fields, Eq. (122) is referred to as the Fokker-Planck equation, while actually, the latter 
equation is much more general — see the next section.) 


Chapter 5 Page 29 of 44 


Essential Graduate Physics SM: Statistical Mechanics 


LeU (5.127) 


when the lifetime is relatively long: t >> zs. Since according to Eq. (111a), the first term of the 
continuity equation (117b) has to be of the order of W/r, in this limit the term, and hence the gradient of 
I, are exponentially small, so the probability current virtually does not depend on gq in the potential 
barrier region. Let us use this fact at the integration of both sides of Eq. (126) over that region: 


q” x 
T 
| exp AD hay wee wexp ae (5.128) 
7 T n T G: 
where the integration limits q’ and q” (see Fig. 10) are selected so that 
T <<U(q')-U(q,),U(q)-U(4") << Uy. (5.129) 


(Obviously, such selection is only possible if the condition (127) is satisfied.) In this limit, the 
contribution from the point g” to the right-hand side of Eq. (129) is negligible because the probability 
density behind the barrier is exponentially small. On the other hand, the probability at the point q’ has to 
be close to the value given by its quasi-stationary Boltzmann distribution (124), so that 


wg )expy ao) w(q,) ex Oe “a, (5.130) 


" 
= rwiay/ exp OO ha (5.131) 
7 


Patience, my reader, we are almost done. The probability density w(q1) at the well’s bottom may 
be expressed in terms of the total probability W of the particle being in the well by using the 


normalization condition 
w= | CO (5.132) 


and Eq. (128) yields 


T 


well's 
bottom 


the integration here may be limited to the region where the difference U(q) — U(q1) is much smaller than 
Up — cf. Eq. (129). According to the Taylor expansion, the shape of virtually any smooth potential U(q) 
near the point g; of its minimum may be well approximated with a quadratic parabola: 

d°U 


Kk 
Ug~a)-Ua)* 7-H)» where fi ae: >0. (5.133) 


q=q, 


With this approximation, Eq. (132) is reduced to the standard Gaussian integral: 


W=w(q,) J ex Md ay ja = 0g.) J exp jaa = Wg, | = ) . (5.134) 


2 


well's 
bottom 


To complete the calculation, we may use a similar approximation for the barrier top: 


64 If necessary, see MA Eq. (6.9b) again. 
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K K 
Wa =4)-Ua)~| UG.) 5 4 a) | Uq)=Uy->G-n) 
d°U 
dq? 
and work out the remaining integral in Eq. (131), because in the limit (129) it is dominated by the 


contribution from a region very close to the barrier top, where the approximation (135) is asymptotically 
exact. As a result, we get 


(5,135) 
>0, 


where «x, =— q=4 


q” _ 1/2 
| exp DO) a = exp a || 208 : (5.136) 
' ‘i T \\ Kk, 
q 
Plugging Eq. (136), and the w(q1) expressed from Eq. (134), into Eq. (131), we finally get 
1/2 
i= WE exp - =| (5.137) 
277 T 


This expression should be compared with the 1D version of Eq. (117b) for the segment [-c0, q’]. 
Since this interval covers the region near g; where most of the probability density resides, and [,(-0%) = 
0, this equation is merely 
dw 
— +I (q')=0. 2138 
ap (q') ( ) 
In our approximation, /,,(g¢’) does not depend on the exact position of the point q’, and is given by Eq. 
(137), so that plugging it into Eq. (138), we recover the exponential decay law (111a), with the lifetime 
t obeying the Arrhenius law (111b), and the following attempt time: 


_ <7 _=2n(c,c,)'?, where t,, =—_. (5.139) 


Ki 


Thus the metastable state lifetime is indeed described by the Arrhenius law, with the attempt 
time scaling as the geometric mean of the system’s “relaxation times” near the potential well bottom (7) 
and the potential barrier top (7m). Let me leave for the reader’s exercise to prove that if the potential 
profile near well’s bottom and/or top is sharp, the expression for the attempt time should be modified, 
but the Arrhenius decay law (111) is not affected. 


5.7. The Fokker-Planck equation 


Formula (139) is just a particular, high-damping limit of a more general result obtained by 
Kramers. In order to get all of it (and much more), we need to generalize the Smoluchowski equation to 
arbitrary values of damping 7. In this case, the probability density w is a function of not only the 
particle’s position q (and time #) but also of its momentum p — see Eq. (2.11). Thus the continuity 
equation (117) needs to be generalized to the 6D phase space {q, p}. Such generalization is natural: 


65 Actually, m describes the characteristic time of the exponential growth of small deviations from the unstable 
fixed point q2 at the barrier top, rather than their decay, as near the stable point q. 
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Ow , : 

ae Vide? Vii (5.140) 
where j, (which was called j, in the last section) is the probability current density in the coordinate 
space, and V, (which was denoted as V in that section) is the usual vector operator in the space, while j, 
is the current density in the momentum space, and V, is the similar vector operator in that space: 


V,= Ya, > V;,= yn 2. (5.141) 


At negligible fluctuations (T — 0), jp, may be composed using the natural analogy with j, — see 
Eq. (119). In our new notation, that relation reads, 


j, =wq=w, (5.142) 
m 


so it is natural to take 
j, =wp=w(F), (5.143a) 


where the (statistical-ensemble) averaged force (¥) includes not only the contribution due to the 


potential’s gradient, but also the drag force —7v provided by the environment — see Eq. (64) and its 
discussion: 


ji, =w(-V,U -nv) =-wV_U +n). (5.143b) 
m 


As a sanity check, it is straightforward to verify that the diffusion-free equation resulting from the 
combination of Eqs. (140), (142) and (143), 


Ow Pp 
Oy lant ae, (v2 }+v, |»(v.u+n2)}, (5.144) 
allows the following particular solution: 
w(q.p.t) = d|q—(q)(0)] d[p — (p)(0)]. (5.145) 
where the statistical-averaged coordinate and momentum satisfy the deterministic equations of motion, 
: p : p 
(qy= () =v, —n 2, (5.146) 
m m 


describing the particle’s drift, with the usual deterministic initial conditions. 


In order to understand how the diffusion should be accounted for, let us consider a statistical 
ensemble of free (V,U = 0, 77 > 0) particles that are uniformly distributed in the direct space q (so that 
V,w = 0), but possibly localized in the momentum space. For this case, the right-hand side of Eq. (144) 
vanishes, i.e. the time evolution of the probability density w may be only due to diffusion. In the 
corresponding limit (F) — 0, the Langevin equation (107) for each Cartesian coordinate is reduced to 


mg,;=F(t), ie. p, = F(t). (5.147) 
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The last equation is identical to the high-damping 1D equation (74) (with Ager = 0), with the replacement 


q — p;/n, and hence the corresponding contribution to Ow/ot may be described by the last term of Eq. 
(122), with that replacement: 


Ow 


ot 


i 
dittusion = DV, a MN GP (5.148) 


p/n 


Now the reasonable assumption that in the arbitrary case the drift and diffusion contributions to Ow/ot 
just add up immediately leads us to the full Fokker-Planck equation: 


(5.149) 


As a sanity check, let us use this equation to calculate the stationary probability distribution of 
the momentum of particles with an arbitrary damping 77 but otherwise free, in the momentum space, 


assuming (just for simplicity) their uniform distribution in the direct space, V, = 0. In this case, Eq. 
(149) is reduced to 


p 2 p = 
v,[o(n2 | +arviw=o ie: v,(2werv,w)=o. (5.150) 
The first integration over the momentum space yields 
2 
Pw4IV,w=j,, ie. w9[Z ery, wai, (5.151) 
m m 


where j,, is a vector constant describing a possible general probability flow in the system. In the absence 
of such flow, j,, = 0, we get 


2 


Pp’) Vow Pp P 
V, +7 =), +TInw|=0, — giving w=constxexp,— , (5.152) 
w 2m 2mT 


2m 


i.e. the Maxwell distribution (3.5). However, the result (152) is more general than that obtained in Sec. 
3.1, because it shows that the distribution stays the same even at non-zero damping. It is easy to verify 
that in the more general case of an arbitrary stationary potential U(q), Eq. (149) is satisfied with the 
stationary solution (3.24), also giving j,,= 0. 


It is also straightforward to show that if the damping is large (in the sense assumed in the last 


section), the solution of the Fokker-Planck equation tends to the following product 
2 


w(q,p,¢) — const x exp, — Peoily w(q,t), (5.153) 
2mT 


where the direct-space distribution ~(q, t) obeys the Smoluchowski equation (122). 


Another important particular case is that of a quasi-periodic motion of a particle, with low 
damping, in a soft potential well. In this case, the Fokker-Planck equation describes both diffusion of the 
effective phase © of such (generally nonlinear, “anharmonic”) oscillator, and slow relaxation of its 


66 It was first derived by Adriaan Fokker in 1913 in his PhD thesis, and further elaborated by Max Planck in 1917. 
(Curiously, A. Fokker is more famous for his work on music theory, and the invention and construction of several 
new keyboard instruments, than for this and several other important contributions to theoretical physics.) 


Chapter 5 Page 33 of 44 


Essential Graduate Physics SM: Statistical Mechanics 


energy. If we are only interested in the latter process, Eq. (149) may be reduced to the so-called energy 
diffusion equation,®’ which is easier to solve. 


However, in most practically interesting cases, solutions of Eq. (149) are rather complicated. 
(Indeed, the reader should remember that these solutions embody, in the particular case T = 0, all 
classical dynamics of a particle.) Because of this, I will present (rather than derive) only one more of 
them: the solution of the Kramers problem (Fig. 10). Acting almost exactly as in Sec. 6, one can show®®’ 
that at virtually arbitrary damping (but still in the limit 7’ << Up), the metastable state’s lifetime is again 
given by the Arrhenius formula (111b), with the attempt time again expressed by the first of Eqs. (139), 
but with the reciprocal time constants 1/7;,2 replaced with 


2 1/2 

@,,, fory<<ma,,, 

a= |e LS ee, oe (5.154) 
: 2m 2m 1/t,,, forma,, <<, 


where @2 =(Ki2/m)'”, and x12 are the effective spring constants defined by Eqs. (133) and (135). Thus, 
in the important particular limit of low damping, Eqs. (111b) and (154) give the famous formula 


Kramers 
formula 


U 
C= > re EXD fr (5.155) for low 
damping 


(0,0,)"° 


This Kramers’ result for the classical thermal activation of the dissipation-free system over a 
potential barrier may be compared with that for its quantum-mechanical tunneling through the barrier.°? 
The WKB approximation for the latter effect gives the expression 


2:2: 
To =T, Exp,—2 [«(ndq : with a we =U(q)-E, (5.156) 


K*(q)>0 


showing that generally, the classical and quantum lifetimes of a metastable state have different 
dependences on the barrier shape. For example, for a nearly-rectangular potential barrier, the exponent 
that determines the classical lifetime (155) depends (linearly) only on the barrier height Uo, while that 
defining the quantum lifetime (156) is proportional to the barrier width and to the square root of Uo. 
However, in the important case of “soft” potential profiles, which are typical for the case of barely 
emerging (or nearly disappearing) quantum wells (Fig. 11), the classical and quantum results are closely 
related. 


U(q) 


Fig. 5.11. Cubic-parabolic potential 
profile and its parameters. 


67 An example of such an equation, for the particular case of a harmonic oscillator, is given by QM Eq. (7.214). 
The Fokker-Planck equation, of course, can give only its classical limit, with n,n, >> 1. 

68 A detailed description of this calculation (first performed by H. Kramers in 1940) may be found, for example, 
in Sec. III.7 of the review paper by S. Chandrasekhar, Rev. Mod. Phys. 15, 1 (1943). 

69 See, e.g., QM Secs. 2.4-2.6. 


Chapter 5 Page 34 of 44 


Soft well: 
thermal 
lifetime 


Soft well: 
quantum 
lifetime 


Essential Graduate Physics SM: Statistical Mechanics 


Indeed, such potential profile U(q) may be well approximated by four leading terms of its Taylor 
expansion, with the highest term proportional to (gq — qo)’, near any point qo in the vicinity of the well. In 
this approximation, the second derivative d’U/dq’ vanishes at the inflection point go = (qi + q)/2, 
exactly between the well’s bottom and the barrier’s top (in Fig. 11, q; and gz). Selecting the origin at 
this point, as this is done in Fig. 11, we may reduce the approximation to just two terms:”° 


b 
U@)=aq-39". (5.157) 
(For the particle’s escape into the positive direction of the g-axis, we should have a, b > 0.) An easy 
calculation gives all essential parameters of this cubic parabola: the positions of its minimum and 
maximum: 


q, =-q, =(a/b)”, (5.158) 
the barrier height over the well’s bottom: 
4 a 1/2 

Opa Ue) a} (5.159) 

and the effective spring constants at these points: 

2 
Ky =K, = sila = 2(ab)'”. (5.160) 
a %,2 


The last expression shows that for this potential profile, the frequencies @,2 participating in Eq. 
(155) are equal to each other, so that this result may be rewritten as 


1/2 
es _ 2(ab) . 


(5.161) 


m 


On the other hand, for the same profile, the WKB approximation (156) (which is accurate when the 
height of the metastable state energy over the well’s bottom, E — U(q1) » ha@/2, is much lower than the 
barrier height Up) yields’! 


(5.162) 


The comparison of the dominating, exponential factors in these two results shows that the 
thermal activation yields a lower lifetime (1.e., dominates the metastable state decay) if the temperature 
is above the crossover value 


iP = he, =7.2 ha,. (5.163) 


70 As a reminder, a similar approximation arises for the P(V) function, at the analysis of the van der Waals model 
near the critical temperature — see Problem 4.6. 

71 The main, exponential factor in this result may be obtained simply by ignoring the difference between EF and 
U(q:), but the correct calculation of the pre-exponential factor requires taking this difference, fia@/2, into account 
— see, e.g., the model solution of QM Problem 2.43. 
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This expression for the cubic-parabolic barrier may be compared with the similar crossover for a 
quadratic-parabolic barrier,’ for which 7, = 27 h@ ~ 6.28 ha. We see that the numerical factors for 
the quantum-to-classical crossover temperature for these two different soft potential profiles are close to 
each other — and much larger than 1, which could result from a naive estimate. 


5.8. Back to the correlation function 


Unfortunately, I will not have time/space to either derive or even review solutions of other 
problems using the Smoluchowski and Fokker-Planck equations, but have to mention one conceptual 
issue. Since it is intuitively clear that the solution w(q, p, 4) of the Fokker-Planck equation for a system 
provides the complete statistical information about it, one may wonder how it may be used to find its 
temporal characteristics that were discussed in Secs. 4-5, using the Langevin formalism. For any 
statistical average of a function taken at the same time instant, the answer is clear — cf. Eq. (2.11): 


(fla, p@) = | fp) w(@,p.)d°ad* p, (5.164) 


but what if the function depends on variables taken at different times, for example as in the correlation 
function Kz) defined by Eq. (48)? 


To answer this question, let us start from the discrete-variable case when Eq. (164) takes the 
form (2.7), which, for our current purposes, may be rewritten as 


(S(O) => 6%, 0- (5.165) 


In plain English, this is a sum of all possible values of the function, each multiplied by its probability as 
a function of time. But this implies that the average (/(‘)f(t’)) may be calculated as the sum of all 
possible products fifin, multiplied by the joint probability to measure outcome m at moment ¢, and 
outcome m’ at moment ¢’. The joint probability may be represented as a product of W,,(t) by the 
conditional probability W(m’, t’| m, t). Since the correlation function is well defined only for stationary 
systems, in the last expression we may take ¢ = 0, 1.e. look for the conditional probability as the solution, 
W(1), of the equation describing the system’s probability evolution, at time 7 = ¢’ — ¢ (rather than t’), 
with the special initial condition 


W,,,(0) =6 


m',m ° 


(5.166) 


On the other hand, since the average (f(f)f(t +7)) of a stationary process should not depend on ¢, instead 
of W,,(0) we may take the stationary probability distribution W,,(%), independent of the initial 
conditions, which may be found as the same special solution, but at time t— ©. As a result, we get 


(FOP +2)) = tin) fe Mu(e). (5.167) 

This expression looks simple, but note that this recipe requires solving the time evolution 
equations for each W,,,(7) for all possible initial conditions (166). To see how this recipe works in 
practice, let us revisit the simplest two-level system (see, e.g., Fig. 4.13, which is reproduced in Fig. 12 


72 See, e.g., QM Sec. 2.4. 
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below in a notation more convenient for our current purposes), and calculate the correlation function of 
its energy fluctuations. 


W,(t) E,=A 


W, (t) E,=0 Fig. 5.12. Dynamics of a two-level system. 


The stationary probabilities of the system’s states (i.e. their probabilities for s > ©) have been 
calculated in problems of Chapter 2, and then again in Sec. 4.4 — see Eq. (4.68). In our current notation 
(Fig. 12), 


+1 
5.168 


eAlT +1 


so that (E(«)) = W,(«)x O+ W, (x0) x A= 


To calculate the conditional probabilities W,,(7 ) with the initial conditions (167) (according to Eq. 
(168), we need all four of them, for {m, m’} = {0, 1}), we may use the master equations (4.100), in our 
current notation reading 
dW, dw, 
dt dt 


Since Eq. (169) conserves the total probability, Wo) + W, = 1, only one probability (say, W,) is an 
independent variable, and for it, Eq. (169) gives a simple, linear differential equation 


=T,W, -C\W,. (5.169) 


dW, 
r 
which may be readily integrated for an arbitrary initial condition: 
W(t) =W,(O)e"** +W,(w)1-e J, (5.171) 


where W,() is given by the second of Eqs. (168). (It is straightforward to verify that the solution for 
W(t) may be represented in a similar form, with the corresponding change of the state index.) 


Now everything is ready to calculate the average (E(1)E(t +7)) using Eq. (167), with firm’ = Eo. 
Thanks to our (smart :-) choice of the energy reference, of the four terms in the double sum (167), all 
three terms that include at least one factor Ey = 0 vanish, and we have only one term left to calculate: 


SE = 
(E(QE(t+7)) = EW, (@)EW,(z) W,(0)=1 = EW, (oW,(O)e +H (oll —e ae 
A T.t 1 —T,t nN A/T —Tst 
=———|e * +———|l-e * ]}=~———-|l+ee * ]. 5.172 
are] ar aay (5.172) 
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From here and the last of Eqs. (168), the correlation function of energy fluctuations is73 


K,(t)= (E(QE(t ie r)) = ((E(t) - (EW) (E(t + 2) -(E))) 
ae (5.173) 


5) 


2 2 
= (E(QE(t+7))-(E(o)) =A er i; e 
so that its variance, equal to K;(0), does not depend on the transition rates + and |. However, since the 
rates have to obey the detailed balance relation (4.103), [\/I't = exp{A/T}, for this variance we may 
formally write 
KA0).. 7 ea Be 1 oa BO 


(5.174) 


A’ (4/7 wl (wry G+ny 


so that Eq. (173) may be represented in a simpler form: 


Energy 
(5 l 75) fluctuations: 


two-level 
system 


We see that the correlation function of energy fluctuations decays exponentially with time, with the net 
rate Ty. Now using the Wiener-Khinchin theorem (58) to calculate its spectral density, we get 


A? ne 


coset dt = —— 5 ee 
aly I,7+0 


(5.176) 


Such Lorentzian dependence on frequency is very typical for discrete-state systems described by 
master equations. It is interesting that the most widely accepted explanation of the 1/fnoise (also called 
the “flicker” or “excess” noise), which was mentioned in Sec. 5, is that it is a result of thermally- 
activated jumps between states of two-level systems with an exponentially-broad statistical distribution 
of the transition rates +). Such a broad distribution follows from the Kramers formula (155), which is 
approximately valid for the lifetimes of both states of systems with double-well potential profiles (Fig. 
13), for a statistical ensemble with a smooth statistical distribution of the energy barrier heights Up. 
Such profiles are typical, in particular, for electrons in disordered (amorphous) solid-state materials, 
which indeed feature high 1/f noise. 


Fig. 5.13. Typical double- 
well potential profile. 


0 q 


Returning to the Fokker-Planck equation, we may use the following evident generalization of 
Eq. (167) to the continuous-variable case: 


73 The step from the first line of Eq. (173) to its second line utilizes the fact that our system is stationary, so that 
(E(t + 1)) = (E(0) = (E(~)) = const. 
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(fOF(e+2)) = [a?qd* pf d°q'a’ p' F.p)w(ap.~)f(ap')w(a'p'.z), (5.177) 


were both probability densities are particular values of the equation’s solution with the delta-functional 
initial condition 
w(q',p',0) = d(q'-q)o(p'-p). (5.178) 


For the Smoluchowski equation valid in the high-damping limit, the expressions are similar, albeit with 
a lower dimensionality: 


(fOF+2)) = [a?q[a?q' F@)a.~)f(a’w(q',7), (5.179) 


w(q',0) = 0(q'-q). (5.180) 


To see this formalism in action, let us use it to calculate the correlation function K,(z) of a linear 
relaxator, 1.e. an overdamped 1D harmonic oscillator with ma@ << 7. In this limit, as Eq. (65) shows, 
the oscillator’s coordinate, averaged over the ensemble of environments, obeys a linear equation, 


n(q)+K(q)=0, (5.181) 


which describes its exponential relaxation from the initial position go to the equilibrium position g = 0, 
with the reciprocal time constant [ = «/77: 


(q\t)= que. (5.182) 


The deterministic equation (181) corresponds to the quadratic potential energy U(q) = xq’/2, so 
that the 1D version of the corresponding Smoluchowski equation (122) takes the form 


(5.183) 


It is straightforward to check, by substitution, that this equation, rewritten for the function w(q’,z), with 
the 1D version of the delta-functional initial condition (180), w(q’,0) = Hq’—q), is satisfied with a 


Gaussian function: 
exp (a'~(a}e)) (5.184) 
2m) ° dq(z) 26q°(r) 


with its center (g)(z) moving in accordance with Eq. (182), and a time-dependent variance 


dq? (t) = &q?(w)(I- grr), where 64” (%) = (q*) = =. (5.185) 


w(q',7) = 


(As a sanity check, the last equality coincides with the equipartition theorem’s result.) Finally, the first 
probability under the integral in Eq. (179) may be found from Eq. (184) in the limit zt > oo (in which 
(q(t) > 0), by replacing q’ with g: 


1 q° 
, 0) = ———______ — ———__}. 5.186 
— saa sana| a 


Now all ingredients of the recipe (179) are ready, and we can spell it out, for f(q) = q, as 
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Tat q |. (v’-geT*} 
(q(t)g(t + r)) = ; sana a aq Jaq sex aol" onl ay?) (5.187) 


The integral over q’ may be worked our first, by replacing this integration variable with (q¢” + ge") and 
hence dq’ with dq”: 


(q()q(t +t 


ee ee a a ° " Ir _ gi 7 
= samen 400 saat Il +ge ks ole (5.188) 


The internal integral of the first term in the parentheses equals zero (as that of an odd function in 
symmetric integration limits), while that with the second term is the standard Gaussian integral, so that 


_ ! rf q _ 2 ote hep fg? 
(q(t)q(t+7)) = rae J q on) sachs Se [e expt 7 dg. (6.189) 


The last integral’? equals 2/2, so that taking into account that for this stationary system 
centered at the coordinate origin, (q(«)) = 0, we finally get a very simple result, 


K (0) =(G(OGC +7) = (Mat +7)) —(q(o))” = (at +7) = ae . (5.190) 


As a sanity check, for t = 0 it yields K,(0) = (¢°) = T/k, in accordance with Eq. (185). As vis increased 
the correlation function decreases monotonically — see the solid-line sketch in Fig. 8. 


So, the solution of this very simple problem has required straightforward but somewhat bulky 
calculations. On the other hand, the same result may be obtained literally in one line using the Langevin 
formalism — namely, as the Fourier transform (59) of the spectral density (68) in the corresponding limit 
mo<< n, with S{@) given by Eq. (73a):75 


2a cosé 
a ) (Ur)? +€ 


This example illustrates the fact that for linear systems (and small fluctuations in nonlinear systems) the 
Langevin approach is usually much simpler than the one based on the Fokker-Planck or Smoluchowski 
equations. However, again, the latter approach is indispensable for the analysis of fluctuations of 
arbitrary intensity in nonlinear systems. 


K,(t)= 2f5, (@)cosat da = 2{ 7 ; cosa@t da =2 


; eee CRE 
0 & K > + (no) x 


To conclude this chapter, I have to emphasize again that the Fokker-Planck and Smoluchowski 
equations give a quantitative description of the time evolution of nonlinear Brownian systems with 
dissipation in the classical limit. The description of the corresponding properties of such dissipative 
(“open”) and nonlinear guantum systems is more complex,’° and only a few simple problems of their 
theory have been solved analytically so far,’”’ typically using a particular model of the environment, e.g., 


7 See, e.g., MA Eq. (6.9c). 

75 The involved table integral may be found, e.g., in MA Eq. (6.11). 

76 See, e.g., QM Sec. 7.6. 

77 See, e.g., the solutions of the 1D Kramers problem for quantum systems with low damping by A. Caldeira and 
A. Leggett, Phys. Rev. Lett. 46, 211 (1981), and with high damping by A. Larkin and Yu. Ovchinnikov, JETP 
Lett. 37, 382 (1983). 
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as a large set of harmonic oscillators with different statistical distributions of their parameters, leading to 
different frequency dependences of the generalized susceptibility 7(@). 


5.10. Exercise problems 


5.1. Treating the first 30 digits of number z = 3.1415... as a statistical ensemble of integers k 
(equal to 3, 1, 4, 1, 5,...), calculate the average (k) and the r.m.s. fluctuation 6k. Compare the results 
with those for the ensemble of completely random decimal integers 0, 1, 2,..,9, and comment. 


5.2. Calculate the variance of fluctuations of a magnetic moment m placed into an external 
magnetic field # within the same two models as in Problem 2.4: 


(i) a spin-/2 with a gyromagnetic ratio y, and 
(ii) a classical magnetic moment m, of a fixed magnitude mo, but an arbitrary orientation, 


both in thermal equilibrium at temperature 7. Discuss and compare the results.’8 


Hint: Mind all three Cartesian components of the vector m. 


5.3. For a field-free, two-site Ising system with energy values E,,, = —Js 5, in thermal equilibrium 
at temperature 7, calculate the variance of energy fluctuations. Explore the low-temperature and high- 
temperature limits of the result. 


5.4. For a uniform, three-site Ising ring with ferromagnetic coupling (and no external field), 
calculate the correlation coefficients K, = (s,s,') for both k= k' and k 4k’. 


5.5. For a field-free 1D Ising system of N >> 1 “spins”, in thermal equilibrium at temperature 
T, calculate the correlation coefficient K; = (s)S/+n), where / and (/ + n) are the numbers of two specific 
spins in the chain. 


Hint: You may like to start with the calculation of the statistical sum for an open-ended chain 
with arbitrary NV > | and arbitrary coupling coefficients J;, and then consider its mixed partial derivative 
over a part of these parameters. 


5.6. Within the framework of the Weiss molecular-field theory, calculate the variance of spin 
fluctuations in the d-dimensional Ising model. Use the result to derive the conditions of its validity. 


5.7. Calculate the variance of energy fluctuations in a quantum harmonic oscillator with 
frequency @, in thermal equilibrium at temperature 7, and express it via the average value of the energy. 


5.8. The spontaneous electromagnetic field inside a closed volume V is in thermal equilibrium at 
temperature 7. Assuming that V is sufficiently large, calculate the variance of fluctuations of the total 


78 Note that these two cases may be considered as the non-interacting limits of, respectively, the Ising model 
(4.23) and the classical limit of the Heisenberg model (4.21), whose analysis within the Weiss approximation was 
the subject of Problem 4.18. 
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energy of the field, and express the result via its average energy and temperature. How large should the 
volume V be for your results to be quantitatively valid? Evaluate this limitation for room temperature. 


5.9. Express the r.m.s. uncertainty of the occupancy N; of a certain energy level & by non- 
interacting: 


(1) classical particles, 
(11) fermions, and 
(111) bosons, 


in thermodynamic equilibrium, via the level’s average occupancy (N;), and compare the results. 


5.10. Express the variance of the number of particles, (N *\y ry, Of a single-phase system in 
equilibrium, via its isothermal compressibility «, = -(1/ V av / éP),. sg 


5.11.” Starting from the Maxwell distribution of velocities, calculate the low-frequency spectral 
density of fluctuations of the pressure P(t) of an ideal gas of N classical particles, in thermal equilibrium 
at temperature 7, and estimate their variance. Compare the former result with the solution of Problem 
2: 


Hints: You may consider a cylindrically-shaped container of volume 
V = LA (see the figure on the right), calculate fluctuations of the force A?) 
exerted by the confined particles on its plane lid of area A, approximating it 
as a delta-correlated process (62), and then re-calculate the fluctuations into 
those of pressure P = A/A. 


5.12. Calculate the low-frequency spectral density of fluctuations of the electric [] g 
current J(t) due to the random passage of charged particles between two conducting 
electrodes — see the figure on the right. Assume that the particles are emitted, at random ||°~> 
times, by one of the electrodes, and are fully absorbed by the counterpart electrode. Can 
your result be mapped on some aspect of the electromagnetic blackbody radiation? 


Hint: For the current /(t), use the same delta-correlated-process approximation as for the force 
At) in the previous problem. 


5.13.”° A very long, uniform string, of mass yw per unit length, is attached 
to a firm support, and stretched with a constant force (“tension”) Y — see the 
figure on the right. Calculate the spectral density of the random force At) 
exerted by the string on the support point, within the plane normal to its length, 
in thermal equilibrium at temperature 7. 


Hint: You may assume that the string is so long that a transverse wave, propagating along it from 
the support point, never comes back. 


79 This problem, conceptually important for the quantum mechanics of open systems, was given in Chapter 7 of 
the QM part of this series, and is repeated here for the benefit of the readers who, by any reason, skipped that 
course. 
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5.14.89 Each of two 3D harmonic oscillators, with mass m, resonance frequency @, and damping 
6 > 0, has electric dipole moment d = qs, where s is the vector of oscillator’s displacement from its 
equilibrium position. Use the Langevin formalism to calculate the average potential of electrostatic 
interaction of these two oscillators (a particular case of the so-called London dispersion force), separated 
by distance r >> (Z/m)"?/qp, in thermal equilibrium at temperature T >> ha. Also, explain why the 
approach used to solve a very similar Problem 2.15 is not directly applicable to this case. 

2 

= dé = 1 . 

[t-ey+@erh 


Hint: You may like to use the following integral: } 
0 


5.15. Within the van der Pol approximation,’! calculate major statistical properties of 


fluctuations of classical self-oscillations, at: 


(1) the free (“autonomous”) run of the oscillator, and 
(ii) their phase been locked by an external sinusoidal force, 


assuming that the fluctuations are caused by a weak external noise with a smooth spectral density S(). 
In particular, calculate the self-oscillation linewidth. 


5.16. Calculate the correlation function of the coordinate of a 1D harmonic oscillator with small 
Ohmic damping at thermal equilibrium. Compare the result with that for the autonomous self-oscillator 
(the subject of the previous problem). 


5.17. Consider a very long, uniform, two-wire transmission line (see the 
figure on the right) with wave impedance 7, which allows propagation of TEM yy fade A 
electromagnetic waves with negligible attenuation, in thermal equilibrium at 
temperature 7. Calculate the variance (Y),, of the voltage Y between the wires 
within a small interval Avof cyclic frequencies. 


Hint: As an E&M reminder,®? in the absence of dispersive materials, TEM waves propagate with 
a frequency-independent velocity (equal to the speed c of light, if the wires are in free space), with the 
voltage Y and the current / (see Fig. above) related as Y (x,f)/I(x,t) = +7, where F is line’s wave 
impedance. 


5.18. Now consider a similar long transmission line but terminated, at one end, with an 


impedance-matching Ohmic resistor R = 7. Calculate the variance (f Yav Of the voltage across the 


80 This problem, for the case of arbitrary temperature, was the subject of QM Problem 7.6, with Problem 5.15 of 
that course serving as the background. However, the method used in the model solutions of those problems 
requires one to prescribe, to the oscillators, different frequencies @, and q@ at first, and only after this more 
general problem has been solved, pursue the limit @, > @, while neglecting dissipation altogether. The goal of 
this problem is to show that the result of that solution is valid even at non-zero damping. 

8! See, e.g., CM Secs. 5.2-5.5. Note that in quantum mechanics, a similar approach is called the rotating-wave 
approximation (RWA) — see, e.g., QM Secs. 6.5, 7.6, 9.2, and 9.4. 

82 See, e.g., EM Sec. 7.6. 


Chapter 5 Page 43 of 44 


Essential Graduate Physics SM: Statistical Mechanics 


resistor, and discuss the relation between the result and the Nyquist formula (81b), including numerical 
factors. 


Hint: A termination with resistance R = ¥ absorbs incident TEM waves without reflection. 


5.19. An overdamped classical 1D particle escapes from a potential well 
with a smooth bottom, but a sharp top of the barrier — see the figure on the right. 
Perform the necessary modification of the Kramers formula (139). 


0 4, G4 | 


5.20. Perhaps the simplest model of the diffusion is the 1D discrete random walk: each time 
interval z, a particle leaps, with equal probability, to any of two adjacent sites of a 1D lattice with spatial 
period a. Prove that the particle’s displacement during a time interval t >> tr obeys Eq. (77), and 
calculate the corresponding diffusion coefficient D. 


5.21. A classical particle may occupy any of N similar sites. Its weak interaction with the 
environment induces random, incoherent jumps from the occupied site to any other site, with the same 
time-independent rate I’. Calculate the correlation function and the spectral density of fluctuations of the 
instant occupancy n(f) (equal to either | or 0) of a site. 
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Chapter 6. Elements of Kinetics 


This chapter gives a brief introduction to the basic notions of physical kinetics. Its main focus is on the 
Boltzmann transport equation, especially within the simple relaxation-time approximation (RTA), which 
allows an approximate but reasonable and simple description of transport phenomena (such as the 
electric current and thermoelectric effects) in gases, including electron gases in metals and 
semiconductors. 


6.1. The Liouville theorem and the Boltzmann equation 


Physical kinetics (not to be confused with “kinematics”!) is the branch of statistical physics that 
deals with systems out of thermodynamic equilibrium. Major effects addressed by kinetics include: 


(i) for autonomous systems (those out of external fields): the transient processes (relaxation), 
that lead from an arbitrary initial state of a system to its thermodynamic equilibrium; 


(ii) for systems in time-dependent (say, sinusoidal) external fields: the field-induced periodic 
oscillations of the system’s variables; and 


(iii) for systems in time-independent (“dc’’) external fields: de transport. 


In the last case, we are dealing with stationary (0/Ot = 0 everywhere), but non-equilibrium 
situations, in which the effect of an external field, continuously driving the system out of equilibrium, is 
balanced by the simultaneous relaxation — the trend back to equilibrium. Perhaps the most important 
effect of this class is the dc current in conductors and semiconductors,! which alone justifies the 
inclusion of the basic notions of kinetics into any set of core physics courses. 


The reader who has reached this point of the notes already has some taste of physical kinetics, 
because the subject of the last part of Chapter 5 was the kinetics of a “Brownian particle”, i.e. of a 
“heavy” system interacting with an environment consisting of many “lighter” components. Indeed, the 
equations discussed in that part — whether the Smoluchowski equation (5.122) or the Fokker-Planck 
equation (5.149) — are valid if the environment is in thermodynamic equilibrium, but the system of our 
interest is not necessarily so. As a result, we could use those equations to discuss such non-equilibrium 
phenomena as the Kramers problem of the metastable state’s lifetime. 


In contrast, this chapter is devoted to the more traditional subject of kinetics: systems of many 
similar particles — generally, interacting with each other, but not too strongly, so that the energy of the 
system still may be partitioned into a sum of single-particle components, with the interparticle 
interactions considered as a weak perturbation. Actually, we have already started the job of describing 
such a system at the beginning of Sec. 5.7. Indeed, in the absence of particle interactions (i.e. when it is 
unimportant whether the particle of our interest is “light” or “heavy”), the probability current densities 
in the coordinate and momentum spaces are given, respectively, by Eq. (5.142) and the first form of Eq. 
(5.143a), so that the continuity equation (5.140) takes the form 

Ow 


By 1 Va (wa)+V¥, (wp)=0. (6.1) 


! This topic was briefly addressed in EM Chapter 4, carefully avoiding the aspects related to the thermal effects. 


© K. Likharev 


Essential Graduate Physics SM: Statistical Mechanics 


If similar particles do not interact, this equation for the single-particle probability density w(q, p, f) is 
valid for each of them, and the result of its solution may be used to calculate any ensemble-average 
characteristic of the system as a whole. 


Let us rewrite Eq. (1) in the Cartesian-component form, 
Ow 0 O 
+> | (wg, )+——(wp,)] =0, (6.2) 
ot 32 j : Op j : 

where the index / lists all degrees of freedom of the particle under consideration, and assume that its 


motion (perhaps in an external, time-dependent field) may be described by a Hamiltonian function W(q;, 
Dj, t). Plugging into Eq. (2) the Hamiltonian equations of motion:? 


OH aH 
ep 04; 


Ow a) OH 0 OH 
+>) w w 
Oot “S| 0q;\ Op; } Op;\ og, 


j 
After differentiation of both parentheses by parts, the equal mixed terms we MAqjOp; and wo? Op;Oq; 
cancel, and using Eq. (3) again, we get the so-called Liouville theorem} 


(6.3) 


qj 


we get 


0. (6.4) 


(6.5) Liouville 


theorem 


Since the left-hand side of this equation is just the full derivative of the probability density w 
considered as a function of the generalized coordinates g(t) of a particle, its generalized momenta 
components p(t), and (possibly) time 7,4 the Liouville theorem (5) may be represented in a surprisingly 
simple form: 

BD 9, (6.6) 
dt 


Physically this means that the elementary probability dW = wa°gqd’p to find a Hamiltonian particle in a 
small volume of the coordinate-momentum space [q, p], with its center moving in accordance to the 
deterministic law (3), does not change with time — see Fig. 1. 


q(¢), p(¢) Fig. 6.1. The Liouville 


fot theorem’s __ interpretation: 
probability’s conservation 
at its flow through the [q, p] 


d*qd*p space. 


2 See, e.g., CM Sec. 10.1. 
3 Actually, this is just one of several theorems bearing the name of Joseph Liouville (1809-1882). 
4 See, e.g., MA Eq. (4.2). 
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At the first glance, this may not look surprising because according to the fundamental Einstein 
relation (5.78), one needs non-Hamiltonian forces (such as the kinematic friction) to have diffusion. On 
the other hand, it is striking that the Liouville theorem is valid even for (Hamiltonian) systems with 
deterministic chaos,> in which the deterministic trajectories corresponding to slightly different initial 
conditions become increasingly mixed with time. 


For an ideal gas of 3D particles, we may use the ordinary Cartesian coordinates 7; (with j = 1, 2, 
3) for the generalized coordinates q;, so that p; become the Cartesian components mv; of the usual 
(linear) momentum, and the elementary volume is just d’rd“p — see Fig. 1. In this case, Eqs. (3) are just 


fae=v, p,=F (6.7) 


3 
LS i, ae (6.8) 
Ot Sal ° Or, Op; 
and conveniently represented in the vector form 
Sav VWF w=. (6.9) 


Of course, the situation becomes much more complex if the particles interact. Generally, a 
system of N similar particles in 3D space has to be described by the probability density being a function 
of 6N + 1 arguments (3N Cartesian coordinates, plus 3N momentum components, plus time). An 
analytical or numerical solution of any equation describing the time evolution of such a function for a 
typical system of N ~ 10”° particles is evidently a hopeless task. Hence, any theory of realistic systems’ 
kinetics has to rely on making reasonable approximations that would simplify the situation. 


One of the most useful approximations (sometimes called Stosszahlansatz — German for the 
“collision-number assumption”) was suggested by Ludwig Boltzmann for gas of particles that move 
freely most of the time but interact during short time intervals, when a particle comes close to either an 
immobile scattering center (say, an impurity in a conductor’s crystal lattice) or to another particle of the 
gas. Such brief scattering events may change the particle’s momentum. Boltzmann argued that they may 
be still approximately described Eq. (9), with the addition of a special term (called the scattering 
integral) to its right-hand side: 


Sav, wiF-V w= ul 


(6.10) 


scattering * 
ot 


This is the Boltzmann equation, also called the “Boltzmann transport equation”. As will be discussed 
below, it may give a very reasonable description of not only classical but also quantum particles, though 
it evidently neglects the quantum-mechanical coherence/entanglement effects® — besides those that may 
be hidden inside the scattering integral. 


5 See, e.g., CM Sec. 9.3. 

6 Indeed, the quantum state coherence is described by off-diagonal elements of the density matrix, while the 
classical probability w represents only the diagonal elements of that matrix. However, at least for the ensembles 
close to thermal equilibrium, this is a reasonable approximation — see the discussion in Sec. 2.1. 
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The concrete form of the scattering integral depends on the type of particle scattering. If the 
scattering centers do not belong to the ensemble under consideration (an example is given, again, by 
impurity atoms in a conductor), then the scattering integral may be expressed as an evident 
generalization of the master equation (4.100): 


Ow 


Ot 


where the physical sense of Ip,» is the rate (1.e. the probability per unit time) for the particle to be 
scattered from the state with the momentum p into the state with the momentum p’ — see Fig. 2. 


scatteering = ap’ [Typ Or) = Typ WO, p.t) 9 (6.11) 


scattering 


center ' 
p 


Fig. 6.2. A single-particle scattering event. 


p 


Most elastic interactions are reciprocal, i.e. obey the following relation (closely related to the 
reversibility of time in Hamiltonian systems): I),)'= Upp, so that Eq. (11) may be rewritten as’ 


Ow 


Ot 


With such scattering integral, Eq. (10) stays linear in w but becomes an integro-differential equation, 
typically harder to solve analytically than differential equations. 


scatteering = ap’ T-sp' [wr,p - t) -_ w(r, Pp, t)| < (6. 12) 


The equation becomes even more complex if the scattering is due to the mutual interaction of the 
particle members of the system — see Fig. 3. 


Fig. 6.3. A particle-particle scattering event. 


In this case, the probability of a scattering event scales as a product of two single-particle 
probabilities, and the simplest reasonable form of the scattering integral is® 


7 One may wonder whether this approximation may work for Fermi particles, such as electrons, for whom the 
Pauli principle forbids scattering into the already occupied state, so that for the scattering p > p’, the term w(r, p, 
t) in Eq. (12) has to be multiplied by the probability [1 — w(r, p’, 4] that the final state is available. This is a valid 
argument, but one should notice that if this modification has been done with both terms of Eq. (12), it becomes 


0 
“ scatteering = fap’ Tsp’ {w(r,p’,o|1 = w(r,p,t)|- w(r,p,t)[1 _ w(r,p’, t)]} : 


Ot 
Opening both square brackets, we see that the probability density products cancel, bringing us back to Eq. (12). 
8 This was the approximation used by L. Boltzmann to prove the famous H-theorem, stating that entropy of the 
gas described by Eq. (13) may only grow (or stay constant) in time, dS/dt > 0. Since the model is very 
approximate, that result does not seem too fundamental nowadays, despite all its historic significance. 
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Ow 34 3 Ty '>p, P,P, w(r,p : t)w(r,P. : t) 
Be scatteering = [a Pp [a dD: 


r ; (6.13) 
Tsp, pop MEP, Bf) 


The integration dimensionality in Eq. (13) takes into account the fact that due to the conservation of the 
total momentum at scattering, 


p+p.=p'+p’, (6.14) 


one of the momenta is not an independent argument, so that the integration in Eq. (13) may be restricted 
to a 6D p-space rather than the 9D one. For the reciprocal interaction, Eq. (13) may also be a bit 
simplified, but it still keeps Eq. (10) a nonlinear integro-differential transport equation, excluding such 
powerful solution methods as the Fourier expansion — which hinges on the linear superposition 
principle. 

This is why most useful results based on the Boltzmann transport equation depend on its further 
simplifications, most notably the relaxation-time approximation — RTA for short.? This approximation is 
based on the fact that in the absence of spatial gradients (V = 0), and external forces (FY = 0), in at the 
thermal equilibrium, Eq. (10) yields 

Ow _ Ow 
ot Ot 


(6.15) 


scattering ? 


so that the equilibrium probability distribution wo(r, p, 4) has to turn any scattering integral to zero. 
Hence at a small deviation from the equilibrium, 


w(r,p,t) = w(r, pt) — w)(r,p.t) > 0, (6.16) 


the scattering integral should be proportional to the deviation w, and its simplest reasonable model is 


ow 
Ot 


~ 
scatteering ~~ ’ 


L 


(6.17) 


where t is a phenomenological constant (which, according to Eq. (15), has to be positive for the 
system’s stability) called the relaxation time. Its physical meaning will be more clear in the next section. 


The relaxation-time approximation is quite reasonable if the angular distribution of the scattering 
rate is dominated by small angles between vectors p and p’ — as it is, for example, for the Rutherford 
scattering by a Coulomb center.!° Indeed, in this case the two values of the function w, participating in 
Eq. (12), are close to each other for most scattering events so that the loss of the second momentum 
argument (p’) is not too essential. However, using the Boltzmann-RTA equation that results from 
combining Eqs. (10) and (17), 

Ow Ww 


VEN WEP V Wa, (6.18) 
Ot T 


we should always remember that this is just a phenomenological model, sometimes giving completely 
wrong results. For example, it prescribes the same time scale (z) to the relaxation of the net momentum 


9 Sometimes this approximation is called the “BGK model”, after P. Bhatnager, E. Gross, and M. Krook who 
suggested it in 1954. (The same year, a similar model was considered by P. Welander.) 
19 See, e.g2., CM Sec. 3.7. 
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of the system, and to its energy relaxation, while in many real systems the latter process (that results 
from inelastic collisions) may be substantially longer. Naturally, in the following sections, I will 
describe only those applications of the Boltzmann-RTA equation that give a reasonable description of 
physical reality. 


6.2. The Ohm law and the Drude formula 


Despite its shortcomings, Eq. (18) is adequate for quite a few applications. Perhaps the most 
important of them is deriving the Ohm law for dc current in a “nearly-ideal” gas of charged particles, 
whose only important deviation from ideality is the rare scattering effects described by Eq. (17). As a 
result, in equilibrium it is described by the stationary probability wo of an ideal gas (see Sec. 3.1): 


g 
w,(r,p,t) = N\e)), (6.19) 
sen. = E(w) 
where g is the internal degeneracy factor (say, g = 2 for electrons due to their spin), and (N(é)) is the 
average occupancy of a quantum state with momentum p, that obeys either the Fermi-Dirac or the Bose- 


Einstein distribution: 
1 


exp{(e—)/T}41 


(The following calculations will be valid, up to a point, for both statistics and hence, in the limit 4/T > 
—o, for a classical gas as well.) 


é=é(p). (6.20) 


(N(e)) 


Now let a uniform de electric field & be applied to the gas of particles with electric charge q, 
exerting force F= qgé on each of them. Then the stationary solution to Eq. (18), with 0/ot = 0, should 
also be stationary and spatially-uniform (V,= 0), so that this equation is reduced to 


~ 


w 
CON Gare (6.21) 


Let us require the electric field to be relatively low, so that the perturbation w it produces is relatively 
small, as required by our basic assumption (16).!! Then on the left-hand side of Eq. (21), we can neglect 
that perturbation, by replacing w with wo, because that side already has a small factor (&). As a result, 
this equation yields 


ss Ow 
=-r1gé-V,w, =-7g6-(V,¢) a (6.22) 
E 
where the second step implies isotropy of the parameters uw and T, i.e. their independence of the 
direction of the particle’s momentum p. But the gradient V,¢ is nothing else than the particle’s velocity 


'l Since the scale of the fastest change of wo in the momentum space is of the order of Owo/Op = (Owo/e)(deéldp) ~ 
(1/T)v, where v is the scale of particle’s speed, the necessary condition of the linear approximation (22) is eér << 
Tv, i.e. if eél << T, where / = vr has the meaning of the effective mean-free path. Since the left-hand side of the 
last inequality is just the average energy given to the particle by the electric field between two scattering events, 
the condition may be interpreted as the smallness of the gas’ “overheating” by the applied field. However, another 
condition is also necessary — see the last paragraph of this section. 
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v — for a quantum particle, its group velocity.!? (This fact is easy to verify for the isotropic and parabolic 
dispersion law, pertinent to classical particles moving in free space, 


_Pit+Pr+P3 


P 
e(p)= 6.23 
(p) 2m 2m eee) 
Indeed, in this case, the j"" Cartesian components of the vector V,é is 
(V,<) =o ae (6.24) 
6p; m 
so that V,é =v.) Hence, Eq. (22) may be rewritten as 
ee 0 
W=-1g@-v— (6.25) 
O€ 


Let us use this result to calculate the electric current density j. The contribution of each particle 
to the current density is qv so that the total density is 


j=[avwd*p =a] v(w,+i)d*p. (6.26) 


Since in the equilibrium state (with w = wo), the current has to be zero, the integral of the first term in 
the parentheses has to vanish. For the integral of the second term, plugging in Eq. (25), and then using 
Eq. (19), we get 


i= qt|v(é v- mi Jer _ qt [v(e - we) d*p,dp : (6.27) 


(2ah)° de 


O& 


where d’p, is the elementary area of the constant energy surface in the momentum space, while dp)is the 
momentum differential’s component normal to that surface. The real power of this result!> is that it is 
valid even for particles with an arbitrary dispersion law «p) (which may be rather complicated, for 
example, for particles moving in space-periodic potentials!*), and gives, in particular, a fair description 
of conductivity’s anisotropy in crystals. 


For free particles whose dispersion law is isotropic and parabolic, as in Eq. (23), the constant 
energy surface is a sphere of radius p, so that dp, = p'dQ = p’ sin@dddg, while dp\| = dp. In the 
spherical coordinates, with the polar axis directed along the electric field vector &, we get (év) = & 
vcos@. Now separating the vector v outside the parentheses into the component vcos@ directed along the 
vector &, and two perpendicular components, vsin@cosg and vsin&ing, we see that the integrals of the 


last two components over the angle g@ give zero. Hence, as we could expect, in the isotropic case the net 
current is directed along the electric field and obeys the linear Ohm law, 


12 See, e.g., QM Sec. 2.1. 

13 Tt was obtained by Arnold Sommerfeld in 1927. 

14 See, e.g., QM Secs. 2.7, 2.8, and 3.4. (In this case, p should be understood as the quasimomentum rather than 
the genuine momentum.) 
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with a field-independent, scalar!> electric conductivity 


2 2a ra ioe) rs) N 
o= =, [dof sin 040 cos’ Of pape |-A0te) | (6.29) 
0 0 0 


(Note that ois proportional to g* and hence does not depend on the particle charge sign.'®) 


Since sin@7@ is just —d(cos@), the integral over 9 equals (2/3). The integral over dg is of course 
just 27, while that over p may be readily transformed to one over the particle’s energy ap) = p’/2m: p* = 
2me, Vv = 2elm, p = (2me)', so that dp = (m/28)'"de, and p’dpv’ = (2me(m/28)'"de (2am) = 
(8mé)'"de. Asa result, the conductivity equals 


o= a “= j(eme’)"- Sa (6.30) 
WU 0 


Now we may work out the integral in Eq. (30) by parts, first rewriting [-O(N(é))/delde as —d[(Ne))]. Due 


to the fast (exponential) decay of the factor (M()) at ¢ > ©, its product by the factor (8me*)"” vanishes 
at both integration limits, and we get 
81 * 47 ((y(e))alme°)*]= 84% (Bm)? [(n(e))201%de 
(22h) 3 4 (22h) 3 , Gh) 
2 3/2 0 . 
q7z. gsm 1/2 
a eae J(n(ehe dé. 


But according to Eq. (3.40), the last factor in this expression (after the x sign) is just the particle density 
n= N/V, so that the Sommerfeld’s result is reduced, for arbitrary temperature, and any particle statistics, 
to the very simple Drude formula,'” 


Drud 
(6.32) formula 


which should be well familiar to the reader from an undergraduate physics course. 


As a reminder, here is its simple classical derivation.!8 Let 27 be the average time interval 
between two sequential scattering events that cause a particle to lose the deterministic component of its 
velocity, Vari, provided by the electric field & on the top of particle’s random thermal motion — which 
does not contribute to the net current. Using the 2* Newton law to describe particle’s acceleration by 


!5 As Eq. (27) shows, if the dispersion law «(p) is anisotropic, the current density direction may be different from 
that of the electric field. In this case, conductivity should be described by a tensor oj, rather than a scalar. 
However, in most important conducting materials, the anisotropy is rather small — see, e.g., EM Table 4.1. 

!6 This is why to determine the dominating type of charge carriers in semiconductors (electrons or holes, see Sec. 
4 below), the Hall effect, which lacks such ambivalence (see, e.g., QM 3.2), is frequently used. 

'7 Tt was derived in 1900 by Paul Drude. Note that Drude also used the same arguments to derive a very simple 
(and very reasonable) approximation for the complex electric conductivity in the ac field of frequency @: o(@) = 
o(0)/(1 — iw), with o(0) given by Eq. (32); sometimes the name “Drude formula” is used for this expression 
rather than for Eq. (32). Let me leave its derivation, from the Boltzmann-RTA equation, for the reader’s exercise. 
18 See also EM Sec. 4.2. 
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the field, dVaig/dt = gé/m, we get (Vain) = tgé/m. Multiplying this result by the particle’s charge g and 
density n = N/V, we get the Ohm law j = o€, with o given by Eq. (32). 


Sommerfeld’s derivation of the Drude formula poses an important conceptual question. The 
structure of Eq. (30) implies that the only quantum states contributing to the electric conductivity are 
those whose derivative [-0O(M(é))/€] is significant. For the Fermi particles such as electrons, in the limit 
T << &, these are the states at the very surface of the Fermi sphere. On the other hand, Eq. (32) and the 
whole Drude reasoning, involves the density n of all electrons. So, what exactly electrons are 
responsible for the conductivity: all of them, or only those at the Fermi surface? For the resolution of 
this paradox, let us return to Eq. (22) and analyze the physical meaning of that result. Let us compare it 
with the following model distribution: 


Wnodel = Wo (r,p —p, t) 2 (6.33) 


where p is some constant, small vector, which describes a small shift of the unperturbed distribution wo 


as a whole, in the momentum space. Performing the Taylor expansion of Eq. (33) in this small 
parameter, and keeping only two leading terms, we get 
= Ww, (r,p.t) + w 


w With Wout = —P* V ,Wo(t,p,2) - (6.34) 


model model ? 


Comparing the last expression with the first form of Eq. (22), we see that they coincide if 
p=gét=Fr. (6.35) 
This means that Eq. (22) describes a small shift of the equilibrium distribution of all particles (in the 


momentum space) by gér along the electric field’s direction, justifying the cartoon shown in Fig. 4. 


P» (a) 


Fig. 6.4. Filling of momentum states by 
a degenerate electron gas: (a) in the 
absence and (b) in the presence of an 
external electric field & Arrows show 
representative scattering events. 


Pp=Ft 


At &= 0, the system is in equilibrium, so that the quantum states inside the Fermi sphere (p < 
Pr), are occupied, while those outside of it are empty — see Fig. 4a. Electron scattering events may 
happen only between states within a very thin layer (| p’/2m — er| ~ T) at the Fermi surface, because only 
in this layer the states are partially occupied, so that both components of the product w(r, p, ON[1 — w(r, 
p’, O], mentioned in Sec. 1, do not vanish. These scattering events, on average, do not change the 
equilibrium probability distribution, because they are uniformly spread over the Fermi surface. 


Now let the electric field be turned on instantly. Immediately it starts accelerating all electrons in 
its direction, i.e. the whole Fermi sphere starts moving in the momentum space, along the field’s 
direction in the real space. For elastic scattering events (with | p’| = |p|), this creates an addition of 
occupied states at the leading edge of the accelerating sphere and an addition of free states on its trailing 
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edge (Fig. 4b). As a result, now there are more scattering events bringing electrons from the leading 
edge to the trailing edge of the sphere than in the opposite direction. This creates the average backflow 
of the state occupancy in the momentum space. These two trends eventually cancel each other, and the 
Fermi sphere approaches a stationary (though not a thermal-equilibrium!) state, with the shift (35) 
relatively to its thermal-equilibrium position. 


Now Fig. 4b may be used to answer the question of which of the two different interpretations of 
the Drude formula is correct, and the answer is: either. On one hand, we can look at the electric current 
as a result of the shift (35) of a// electrons in the momentum space. On the other hand, each filled 
quantum state deep inside the sphere gives exactly the same contribution to the net current density as it 
did without the field. All these internal contributions to the net current cancel each other so that the 
applied field changes the situation only at the Fermi surface. Thus it is equally legitimate to say that 
only the surface states are responsible for the non-zero net current.!9 


Let me also mention another paradox related to the Drude formula, which is often misunderstood 
(not only by students :-). As was emphasized above, 7 is finite even at e/astic scattering — that by itself 
does not change the total energy of the gas. The question is how can such scattering be responsible for 
the Ohmic resistivity p = 1/o, and hence for the Joule heat production, with the power density # = j-€= 
2220 The answer is that the Drude/Sommerfeld formulas describe just the “bottleneck” of the Joule 
heat formation. In the scattering picture (Fig. 4b) the states filled by elastically scattered electrons are 
located above the (shifted) Fermi surface, and these electrons eventually need to relax onto it via some 
inelastic process, which releases their excessive energy in the form of heat (in solid state, described by 
phonons — see Sec. 2.6). The rate and other features of these inelastic phenomena do not participate in 
the Drude formula directly, but for keeping the theory valid (in particular, keeping the probability 
distribution w close to its equilibrium value wo), their intensity has to be sufficient to avoid gas 
overheating by the applied field. In some poorly conducting materials, charge carrier overheating 
effects, resulting in deviations from the Ohm law, i.e. from the linear relation (28) between j and &, may 
be observed already at rather practicable electric fields. 


One final comment is that the Sommerfeld theory of the Ohmic conductivity works very well for 
the electron gas in most conductors. The scheme shown in Fig. 4 helps to understand why: for 
degenerate Fermi gases the energies of all particles whose scattering contributes to transport properties, 
are close (€ ® &;) and prescribing them all the same relaxation time 7 is very reasonable. In contrast, in 
classical gases, with their relatively broad distribution of ¢, some results given by the Boltzmann-RTA 
equation (18) are valid only by the order of magnitude. 


6.3. Electrochemical potential and the drift-diffusion equation 


Now let us generalize our calculation to the case when the particle transport takes place in the 
presence of a time-independent spatial gradient of the probability distribution, V,w # 0, caused for 
example by that of the particle concentration n = N/V (and hence, according to Eq. (3.40), of the 


19 So here, as it frequently happens in physics, formulas (or graphical sketches, such as Fig. 4b) give a more clear 
and unambiguous description of reality than words — the privilege lacked by many “scientific” disciplines, rich 
with unending, shallow verbal debates. Note also that, as frequently happens in physics, the dual interpretation of 
ois expressed by two different but equal integrals (30) and (31), related by the integration-by-parts rule. 

20 This formula is probably self-evident, but if you need you may revisit EM Sec. 4.4. 
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chemical potential 42), while still assuming that temperature T is constant. For this generalization, we 
should keep the second term on the left-hand side of Eq. (18). If the gradient of w is sufficiently small, 
we can repeat the arguments of the last section and replace w with wo in this term as well. With the 
applied electric field € represented as (-V¢),?! where ¢ is the electrostatic potential, Eq. (25) becomes 


ary Seqvp—Vo J. (6.36) 
é 


Since in any of the equilibrium distributions (20), (N(é)) is a function of ¢ and yw only in the combination 
(€— y), it obeys the following relation: 


on aaa oa (6.37) 
Using this relation, the gradient of wo o (M(é)) may be represented as?2 
Vw, = yu, for T =const, (6.38) 
so that Eq. (36) becomes 
p= v-(qVp+Vu) =e v- Vu, (6.39) 


where the following sum, 
HM =u+q¢, (6.40) 


is called the electrochemical potential. Now repeating the calculation of the electric current, carried out 
in the last section, we get the following generalization of the Ohm law (28): 


j=o(-Vul/q)=08€, (6.41) 


where the effective electric field €@ is proportional to the gradient of the electrochemical potential, rather 
of the electrostatic potential: 


(6.42) 


The physics of this extremely important and general result?3 may be explained in two ways. 
First, let us have a look at the energy spectrum of a degenerate Fermi-gas confined in a volume of finite 
size, but otherwise free. To ensure such a confinement we need a piecewise-constant potential U(r) — a 
“hard-wall, flat-bottom potential well” — see Fig. 5a. (For conduction electrons in a metal, such profile is 


21 Since we will not encounter V, in the balance of this chapter, from this point on, the subscript r of the operator 
V,-is dropped for the notation brevity. 

22 Since we consider wo as a function of two independent arguments r and p, taking its gradient, i.e. the 
differentiation of this function over r, does not involve its differentiation over the kinetic energy ¢— which is a 
function of p only. 

23 Note that Eq. (42) does not include the phenomenological parameter 7 of the relaxation-time approximation, 
signaling that it is much more general than the RTA. Indeed, this equality is based entirely on the relation between 
the second and third terms on the left-hand side of the general Boltzmann equation (10), rather than on any details 
of the scattering integral on its right-hand side. 
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provided by the positively charged ions of the crystal lattice.) The well should be of a sufficient depth 
Uo > & = r= to provide the confinement of the overwhelming majority of the particles, with energies 
below and somewhat above the Fermi level ¢. This means that there should be a substantial energy gap, 


yal, poy; (6.43) 


between the Fermi energy of a particle inside the well, and its potential energy Up outside the well. (The 
latter value of energy is usually called the vacuum level.) The difference defined by Eq. (43) is called the 
workfunction;** for most metals, it is between 4 and 5 eV, so that the relation y>> T is well fulfilled for 
room temperatures (J ~ 0.025 eV) — and actually for all temperatures up to the metal’s evaporation 
point. 


(a) (b) (c) 


vacuum 


Fig. 6.5. Potential profiles of (a) a single conductor and (b, c) a system of 
two closely located conductors, for two different biasing situations: (b) zero 
electrostatic field (the “flat-band condition”), and (c) zero voltage Aw’. 


Now let us consider two conductors, with different values of y, separated by a small spatial gap 
d — see Figs. 5b,c. Panel (b) shows the case when the electric field € =—V¢ in the free-space gap 
between the conductors equals zero, i.e. their electrostatic potentials ¢ are equal.*° If there is an 
opportunity for particles to cross the gap (e.g., by either the thermally-activated hopping over the 
potential barrier, discussed in Secs. 5.6-5.7, or the quantum-mechanical tunneling through it), there will 
be an average flow of particles from the conductor with the higher Fermi level to that with the lower 
Fermi level,?6 because the chemical equilibrium requires their equality — see Secs. 1.5 and 2.7. If the 
particles have an electric charge (as electrons do), the equilibrium will be automatically achieved by 
them recharging the effective capacitor formed by the conductors, until the electrostatic energy 
difference gA@g reaches the value reproducing that of the workfunctions (Fig. 5c). So for the equilibrium 
potential difference?” we may write 

gAg=Ay =—An. (6.44) 


At this equilibrium, the electric field in the gap between the conductors is 


24 Sometimes it is also called the “electron affinity”, though this term is mostly used for atoms and molecules. 

25 In semiconductor physics and engineering, the situation shown in Fig. 5b is called the flat-band condition, 
because any electric field applied normally to a surface of a semiconductor leads to the so-called energy band 
bending — see the next section. 

26 As measured from a common reference value, for example from the vacuum level — rather than from the bottom 
of an individual potential well as in Fig. 5a. 

27 In physics literature, it is usually called the contact potential difference, while in electrochemistry (for which it 
is one of the key notions), the term Volta potential is more common. 
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fas ae pa cde ee (6.45) 
d qd qd 


in Fig. 5c this field is clearly visible as the tilt of the electric potential profile. Comparing Eq. (45) with 
the definition (42) of the effective electric field , we see that the equilibrium, i.e. the absence of current 


through the potential barrier, is achieved exactly when @ = 0, in accordance with Eq. (41). 


The electric field dichotomy, & <> @, raises a natural question: which of these fields we are 
speaking about in the everyday and laboratory practice? Upon some contemplation, the reader should 
agree that most of our electric field measurements are done indirectly, by measuring corresponding 
voltages — with voltmeters. A vast majority of these instruments belong to the so-called electrodynamic 
variety, which is based on the measurement of a small current flowing through the voltmeter.*® As Eq. 
(41) shows, such electrodynamic voltmeters measure the electrochemical potential difference Ayw’/q. 
However, there exists a rare breed of electrostatic voltmeters (also called “electrometers”) that measure 
the electrostatic potential difference Ag between two conductors. One way to implement such an 
instrument is to use an ordinary, electrodynamic voltmeter, but with the reference point set at the flat- 
band condition (Fig. 5b) between the conductors. (This condition may be detected by vanishing electric 
charge on the adjacent surfaces of the conductors, and hence by the absence of its modulation in time if 
the distance between the surfaces is periodically modulated.) 


Now let me return to Eq. (41) and make two very important remarks. First, it says that in the 
presence of an electric field, the current vanishes only if Vu’ = 0, i.e. that the electrochemical potential 
uw’, rather than the chemical potential wz, has to be position-independent in a system in thermodynamic 
(thermal, chemical, and electric) equilibrium of a conducting system. This result by no means 
contradicts the fundamental thermodynamic relations for discussed in Sec. 1.5, or the statistical 
relations involving “4, which were discussed in Sec. 2.7 and beyond. Indeed, according to Eq. (40), “’(r) 
is “merely” the chemical potential referred to the local value of the electrostatic energy g(r), and in all 
previous parts of the course, this energy was assumed to be constant through the system. 


Second, note another interpretation of Eq. (41), which may be achieved by modifying Eq. (38) 
for the particular case of the classical gas. Indeed, the local density n = N/V of the gas obeys Eq. (3.32), 
which may be rewritten as 


n(r) = const x exp| | : (6.46) 
Taking the spatial gradient of both sides of this relation (still at constant 7), we get 
1 ya n 
Vn =const x —exp, — ->Vu=— Vu, 6.47 
r r Hh pH (6.47) 
so that Vu = (T/n)Vn, and Eq. (41), with o given by Eq. (32), may be recast as 
: Vu) qt 1 ig 
j=o| - |=+—n| -V¢-—Vu |= q—(ng@ -TVn). (6.48) 
q m q m 


28 The devices for such measurement may be based on the interaction between the measured current and a 
permanent magnet, as pioneered by A.-M. Ampere in the 1820s — see, e.g., EM Chapter 5. Such devices are 
sometimes called galvanometers, honoring another pioneer of electricity, Luigi Galvani. 
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Hence the current density may be viewed as consisting of two independent parts: one due to particle 
drift induced by the “usual” electric field € =—V4@, and another due to their diffusion — see Eq. (5.118) 
and its discussion. This is exactly the physics of the “mysterious” term Vy in Eq. (42), though its 
simple form (48) is valid only in the classical limit. 


Besides being very useful for applications, Eq. (48) also gives us a pleasant surprise. Namely, 
plugging it into the continuity equation for electric charge,?9 


O(qn) 


+V-1=0 , 6.49 
rs j (6.49) 
we get (after the division of all terms by g7z/m) the so-called drift-diffusion equation:>*° 
no =V(nVU)+TV?n, with U=q¢. (6.50) 
- 


Comparing it with Eq. (5.122), we see that the drift-diffusion equation is identical to the Smoluchowski 
equation,?! provided that we parallel the ratio 7/m with the mobility “4 = 1/77 of the Brownian particle. 
Now using the Einstein relation (5.78), we see that the effective diffusion constant D of the classical gas 
of similar particles is 

_ WT 


m 


D (6.51a) 

This important relation is more frequently represented in either of two other forms. First, since 
the rare scattering events we are considering do not change the statistics of the gas in thermal 
equilibrium, we may still use the Maxwell-distribution result (3.9) for the average-square velocity (v*), 
to recast Eq. (51a) as 


D= = (')r. (6.51b) 


One more popular form of the same relation uses the notion of the mean free path 1, which may be 
defined as the average distance passed by the particle between two sequential scattering events: 


D=ai()", with = (v?)"°r. (6.51c) 


In the forms (51b)-(51c), the result for D makes more physical sense, because it may be readily derived 
(admittedly, with some uncertainty of the numerical coefficient) from simple kinematic arguments — the 
task left for the reader’s exercise. Note that since the definition of z in Eq. (17) is phenomenological, so 
is the above definition of /; this is why several definitions of this parameter, which may differ by a 
numerical factor of the order of 1, are possible. 


Note also that using Eq. (51a), Eq. (48) may be rewritten as an expression for the particle flow 
density jn = Njw =j/q: 
j, =mu,,qé —DVn, (6.52) 


29 Tf this relation is not evident, please revisit EM Sec. 4.1. 

30 Sometimes this term is associated with Eq. (52). One may also run into the term “convection-diffusion 
equation” for Eq. (50) with the replacement (51a). 

31 And hence, at negligible VU, identical to the diffusion equation (5.116). 
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with the first term on the right-hand side describing particles’ drift, while the second one, their diffusion. 
I will discuss the application of this equation to the most important case of non-degenerate (‘quasi- 
classical’) gases of electrons and holes in semiconductors, in the next section. 


To complete this section, let me emphasize again that the mathematically similar drift-diffusion 
equation (50) and the Smoluchowski equation (5.122) describe different physical situations. Indeed, our 
(or rather Einstein and Smoluchowski’s :-) treatment of the Brownian motion in Chapter 5 was based on 
a strong hierarchy of the system, consisting of a large “Brownian particle” in an environment of many 
smaller particles — “molecules”. On the other hand, in this chapter we are considering a gas of similar 
particles. Nevertheless, the equations describing the dynamics of their probability distribution, are the 
same — at least within the framework of the Boltzmann transport equation with the relaxation-time 
approximation (17) of the scattering integral. The origin of this similarity is the fact that Eq. (12) is 
clearly applicable to a Brownian particle as well, with each “scattering” event being the particle’s hit by 
a random molecule of its environment. Since, due to the mass hierarchy, the particle momentum change 
at each such event is very small, the scattering integral has to be local, i.e. depend only on w at the same 
momentum p as the left-hand side of the Boltzmann equation, so that the relaxation time approximation 
(17) is absolutely natural — indeed, more natural than for our current case of similar particles. 


6.4. Charge carriers in semiconductors 


Now let me demonstrate the application of the concepts discussed in the last section to 
understanding the basic kinetic properties of semiconductors and a few key semiconductor structures — 
which are the basis of most modern electronic and optoelectronic devices, and hence of all our IT 
civilization. For that, I will need to take a detour to discuss their equilibrium properties first. 


I will use an approximate but reasonable picture in which the energy of the electron subsystem in 
a solid may be partitioned into the sum of effective energies ¢ of independent electrons. Quantum 
mechanics says*? that in such periodic structures as crystals, the stationary state energy « of a particle 
interacting with the atomic lattice follows one of periodic functions ¢,(q) of the quasimomentum q, 
oscillating between two extreme values &|min and &|max. These allowed energy bands are separated by 
bandgaps, of widths A, = &|min — &1-ilmax, With no allowed states inside them. Semiconductors and 
insulators (dielectrics) are defined as such crystals that in equilibrium at T = 0, all electron states in 
several energy bands (with the highest of them called the valence band) are completely filled, (N(&)) = 
1, while those in the upper bands, starting from the lowest, conduction band, are completely empty, 
(N(&)) = 0.3% 34 Since the electrons follow the Fermi-Dirac statistics (2.115), this means that at T > 0, 


32 See, e.g., QM Sec. 2.7 and 3.4, but the thorough knowledge of this material is not necessary for following 
discussions of this section. If the reader is not familiar with the notion of quasimomentum (alternatively called the 
“crystal momentum”), its following semi-quantitative interpretation may be useful: q is the result of quantum 
averaging of the genuine electron momentum p over the crystal lattice period. In contrast to p, which is not 
conserved because of the electron’s interaction with the atomic lattice, q is an integral of motion — in the absence 
of other forces. 

33 This mapping of electrical properties of crystals on their band structure was pioneered in 1931-32 by Alan H. 
Wilson. 

34 Tn insulators, the bandgap A is so large (e.g., ~9 eV in SiO2) that the conduction band remains unpopulated in 
all practical situations, so that the following discussion is only relevant for semiconductors, with their moderate 
bandgaps — such as 1.14 eV in the most important case of silicon at room temperature. 


Chapter 6 Page 15 of 38 


Essential Graduate Physics SM: Statistical Mechanics 


the Fermi energy é¢ = 40) is located somewhere between the valence band’s maximum &|max (usually 
called simply éy), and the conduction band’s minimum &|min (called &) — see Fig. 6. 


Fig. 6.6. Calculating win an 
intrinsic semiconductor. 


q 


Let us calculate the population of both branches ¢,(q), and the chemical potential w in 
equilibrium at T > 0. Since the functions ¢,(q) are typically smooth, near the bandgap edges the 
dispersion laws &(q) and &(q) may be well approximated with quadratic parabolas. For our analysis, let 
us take the parabolas the simplest, isotropic form, with origins at the same quasimomentum, taking it for 
the reference point:35 


2 
/2m., fore > 
-{fer8 Mer BONES FCr with eo -ey =A. (6.53) 


éy-q' /2m,, fore<e,, 


The positive constants mc and my are usually called the effective masses of, respectively, electrons and 
holes. (In a typical semiconductor, mc is a few times smaller than the free electron mass me, while my is 
closer to me.) 


Due to the similarity between the top line of Eq. (53) and the dispersion law (3.3) of free 
particles, we may re-use Eq. (3.40), with the appropriate particle mass m, the degeneracy factor g, and 
the energy origin, to calculate the full spatial density of populated states (in semiconductor physics, 
called electrons in the narrow sense of the word): 


3/2 
Z a= Je elde = aro a [(NE +e) ere, (6.54) 


where € = €-& = 0. Similarly, the density p of “no-electron” excitations (called oles) in the valence 
band is the number of unfilled states in the band, and hence may be calculated as 


3/2 © 


éy 
p=" fi-(e)les(ewe= 2 fll-(Mley—ayerae, (655 


where in this case, € > 0 is defined as (& — 6). If the electrons and holes3¢ are in the thermal and 
chemical equilibrium, the functions (M(é)) in these two relations should follow the Fermi-Dirac 


35 Tt is easy (and hence is left for the reader’s exercise) to verify that all equilibrium properties of charge carriers 
remain the same (with some effective values of mc and my) if €(q) and &(q) are arbitrary quadratic forms of the 
Cartesian components of the quasimomentum. A mutual displacement of the branches &(q) and &(q) in the 
quasimomentum space is also unimportant for statistical and most transport properties of the semiconductors, 
though it is very important for their optical properties — which I will not have time to discuss in any detail. 

36 The collective name for them in semiconductor physics is charge carriers — or just “carriers”. 
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distribution (2.115) with the same temperature 7 and the same chemical potential “. Moreover, in our 
current case of an undoped (intrinsic) semiconductor, these densities have to be equal, 


n=p=n,, (6.56) 


because if this electroneutrality condition was violated, the volume would acquire a non-zero electric 
charge density p = e(p — n), which would result, in a bulk sample, in an extremely high electric field 
energy. From this condition, we get a system of two equations, 


Zome” j a! dE gym, j a! de (6.57) 


a ar eT re (aS) eae c/a I 
20h? exp(E+e.—m)/Th+1 V272n? 4 exp(é -—e, + u)/T}+1 
whose solution gives both the requested charge carrier density n;and the Fermi level yw. 


For an arbitrary ratio A/T, this solution may be found only numerically, but in most practical 
cases, this ratio is very large. (Again, for Si at room temperature, A ~ 1.14 eV, while T ~ 0.025 eV.) In 
this case, we may use the same classical approximation as in Eq. (3.45), to reduce Eqs. (54) and (55) to 
simple expressions 


N=N¢ exp a \ pany exp =A for T << A, (6.58) 
where the temperature-dependent parameters 
3/2 3/2 
Sc { mL Sy ( MyT 
ic = and ny, = => 6.59 
: ge (Bet) . ae ee) 


may be interpreted as the effective numbers of states (per unit volume) available for occupation in, 
respectively, the conduction and valence bands, in thermal equilibrium. For usual semiconductors (with 
2c ~ gv ~ 1, and mc ~ my ~ m-,), at room temperature, these numbers are of the order of 3x10 m? = 
3x10'’cm™. (Note that all results based on Eqs. (58) are only valid if both n and p are much lower than, 
respectively, mc and ny.) 


With the substitution of Eqs. (58), the system of equations (56) allows a straightforward solution: 


Me ou TES 4. - are : ii, n, = (nny) exp|- =| (5:00) 
2 2 &c 2 Mo 


Since in all practical materials the logarithms in the first of these expressions are never much larger than 
1,37 it shows that the Fermi level in intrinsic semiconductors never deviates substantially from the so- 
called midgap value (éy +&)/2 — see the (schematic) Fig. 6. In the result for n;, the last (exponential) 
factor is very small, so that the equilibrium number of charge carriers is much lower than that of the 
atoms — for the most important case of silicon at room temperature, nj ~ 10'°cm™. The exponential 
temperature dependence of n; (and hence of the electric conductivity o « nj) of intrinsic semiconductors 
is the basis of several applications, for example simple germanium resistance thermometers, efficient in 
the whole range from ~0.5K to ~100K. Another useful application of the same fact is the extraction of 


37 Note that in the case of simple electron spin degeneracy (gy = gc = 2), the first logarithm vanishes altogether. 
However, in many semiconductors, the degeneracy is factored by the number of similar energy bands (e.g., six 
similar conduction bands in silicon), and the factor In(gy/gc) may slightly affect quantitative results. 
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the bandgap of a semiconductor from the experimental measurement of the temperature dependence of o 
o ni — frequently, in just two well-separated temperature points. 


However, most applications require a much higher concentration of carriers. It may be increased quite 
dramatically by planting into a semiconductor a relatively small number of slightly different atoms — either 
donors (e.g., phosphorus atoms for Si) or acceptors (e.g., boron atoms for Si). Let us analyze the first 
opportunity, called n-doping, using the same simple energy band model (53). If the donor atom is only 
slightly different from those in the crystal lattice, it may be easily ionized — giving an additional electron 
to the conduction band, and hence becoming a positive ion. This means that the effective ground state 
energy &p of the additional electrons is just slightly below the conduction band edge & — see Fig. 7a.*8 


(a) (b) 
i Fig. 6.7. The Fermi levels yw in 
Sie eo ae ra (a) n-doped and (b) p-doped 
A A semiconductors. Hatching shows 
Hl ~------------}-- the ranges of unlocalized state 
6) eth Ex chen 


Reviewing the arguments that have led us to Eqs. (58), we see that at relatively low doping, 
when the strong inequalities n << nc and p << ny still hold, these relations are not affected by the 
doping, so that the concentrations of electrons and holes given by these equalities still obey a universal 
(doping-independent) relation following from Eqs. (58) and (60):°9 


np =n. (6.61) 


However, for a doped semiconductor, the electroneutrality condition looks differently from Eq. (56), 
because the total density of positive charges in a unit volume is not p, but rather (p + n+), where n. is the 
density of positively-ionized (“activated”) donor atoms, so that the electroneutrality condition becomes 


ae (6.62) 


If virtually all dopants are activated, as it is in most practical cases,*° then we may take n, = np, where 
Np is the total concentration of donor atoms, i.e. their number per unit volume, and Eq. (62) becomes 


n=p+Ny.- (6.63) 


Plugging in the expression p = n;/n, following from Eq. (61), we get a simple quadratic equation for n, 
with the following physically acceptable (positive) solution: 


38 Note that in comparison with Fig. 6, here the (for most purposes, redundant) information on the g-dependence 
of the energies is collapsed, leaving the horizontal axis of such a band-edge diagram free for showing their 
possible spatial dependences — see Figs. 8, 10, and 11 below. 

39 Very similar relations may be met in the theory of chemical reactions (where it is called the Jaw of mass 
action), and other disciplines — including such exotic examples as the theoretical ecology. 

40 Let me leave it for the reader’s exercise to prove that this assumption is always valid unless the doping density 
Np becomes comparable to nc, and as a result, the Fermi energy 42 moves into a ~7-wide vicinity of &p. 
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2 1/2 
n=" 4) "Dan? |. (6.64) 
2 4 
This result shows that the doping affects n (and hence w= & — Tin(nc/n) and p = nj/n) only if the 
dopant concentration mp is comparable with, or higher than the intrinsic carrier density nj given by Eq. 
(60). For most applications, mp is made much higher than nj; in this case Eq. (64) yields 

2 2 


n: n 
N®Ny >> Ni, p= : a << n, MeL, =6 Tin. (6.65) 
D D 


Because of the reasons to be discussed very soon, modern electron devices require doping densities 
above 10'*cm”, so that the logarithm in Eq. (65) is not much larger than 1. This means that the Fermi 
level rises from the midgap to a position only slightly below the conduction band edge & — see Fig. 7a. 


The opposite case of purely p-doping, with na acceptor atoms per unit volume, and a small 
activation (negative ionization) energy €, — éy << A,*! may be considered absolutely similarly, using 
the electroneutrality condition in the form 


n+n_ =D), (6.66) 


where n_ is the number of activated (and hence negatively charged) acceptors. For the relatively high 
concentration (nj << na << ny), virtually all acceptors are activated, so that n_ = na, Eq. (66) may be 
approximated as n + na = p, and the analysis gives the results dual to Eq. (65): 

2 2 


Hiya Gite Be Ds Me UL, = Ey +TIn, (6.67) 
P My Ny 


pny >>nN,, n= 


so that in this case, the Fermi level is just slightly above the valence band edge (Fig. 7b), and the 
number of holes far exceeds that of electrons — again, in the narrow sense of the word. Let me leave the 
analysis of the simultaneous n- and p-doping (which enables, in particular, so-called compensated 
semiconductors with the sign-variable difference n — p ¥ np — na) for the reader’s exercise. 


Now let us consider how a sample of a doped semiconductor (say, a p-doped one) responds to a 
static external electrostatic field & applied normally to its surface.42 (In semiconductor integrated 
circuits, such field is usually created by the voltage applied to a special highly-conducting gate 
electrode separated from the semiconductor surface by a thin insulating layer.) Assuming that the field 
penetrates into the sample by a distance 2 much larger than the crystal lattice period a (the assumption 
to be verified a posteriori), we may calculate the distribution of the electrostatic potential ¢ using the 
macroscopic version of the Poisson equation.*? Assuming that the semiconductor occupies the semi- 
space x > 0 and that €=n,&, the equation reduces to the following 1D form*4 


41 For the typical donors (P) and acceptors (B) in silicon, both ionization energies, (é — é) and (& — &y), are 
close to 45 meV, i.e. are indeed much smaller than A ~ 1.14 eV. 

42 A simplified version of this analysis was discussed in EM Sec. 2.1. 

43 See, e.g., EM Sec. 3.4. 

44 T am sorry for using, for the SI electric constant &, the same Greek letter as for single-particle energies, but 
both notations are traditional, and the difference between these uses will be clear from the context. 
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d°¢ = P(x) (6 68) 
dx? KE) 


Here x is the dielectric constant of the semiconductor matrix — excluding the dopants and charge 
carriers, which in this approach are treated as explicit (“stand-alone”) charges, with the volumic density 


p=e(p—n_—n). (6.69) 


(As a sanity check, Eqs. (68)-(69) show that if & = —-dd/dx = 0, then p = 0, bringing us back to the 
electroneutrality condition (66), and hence the “flat” band-edge diagrams shown in Figs. 7b and 8a.) 


(b) (c) 
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Fig. 6.8. The band-edge diagrams of the electric field penetration into a uniform p-doped semiconductor: 
(a) €=0, (b) &€<0, and (c) é> & > 0. Solid red points depict positive charges; solid blue points, negative 
charges; and hatched blue points, possible electrons in the inversion layer — all very schematically. 


In order to get a closed system of equations for the case & # 0, we should take into account that 
the electrostatic potential ¢ # 0, penetrating into the sample with the field,*5 adds the potential 
component q&(x) = —eg(x) to the energy of each electron, and hence shifts the whole local system of 
single-electron energy levels “vertically” by this amount — down for ¢ > 0, and up for ¢< 0. As a result, 
the field penetration leads to what is called band bending — see the band-edge diagrams schematically 
shown in Figs. 8b,c for two possible polarities of the applied field, which affects the distribution @(x) via 
the boundary condition*® 
49 (0) ee (6.70) 

dx 
Note that the electrochemical potential wz’ (which, in accordance with the discussion in Sec. 3, replaces 
the chemical potential in presence of the electric field),4”7 has to stay constant through the system in 
equilibrium, keeping the electric current equal to zero — see Eq. (41). For arbitrary doping parameters, 
the system of equations (58) (with the replacements ey > éy — eg, and “4 — yw’) and (68)-(70), plus the 


45 It is common (though not necessary) to select the energy reference so that deep inside the semiconductor, ¢= 0; 
in what follows I will use this convention. 

46 Here & is the field just inside the semiconductor. The free-space field necessary to create it is « times larger — 
see, e.g., the same EM Sec. 3.4, in particular Eq. (3.56). 

47 In semiconductor physics literature, the value of wv’ is usually called the Fermi level, even in the absence of the 
degenerate Fermi sea typical for metals — cf. Sec. 3.3. In this section, I will follow this common terminology. 
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relation between n_ and na (describing the acceptor activation), does not allow an analytical solution. 
However, as was discussed above, in the most practical cases na >> ni, we may use the approximate 
relations n_~ na, and n ~ 0 at virtually any values of yz’ within the locally shifted bandgap [ey —e@(x), & 
— ex)], so that the substitution of these relations, and the second of Eqs. (58), with the mentioned 
replacements, into Eq. (69) yields 


preny, exp] Set en, =en, By exp| == exp Fh 1 : (6.71) 
ii ns T i 


The x-independent electrochemical potential (a.k.a. Fermi level) yz’ in this relation should be equal to the 
value of the chemical potential “(x — ©) in the semiconductor’s bulk, given by the last of Eqs. (67), 
which turns the expression in the parentheses into 1. With these substitutions, Eq. (68) becomes 


d°p eng leo} “| \ five sede ei eacxegey (6.72) 


2 
dx KE, 


This nonlinear differential equation may be solved analytically, but in order to avoid a 
distraction by this (rather bulky) solution, let me first consider the case when the electrostatic potential 
is sufficiently small — either because the external field is small, or because we focus on the distances 
sufficiently far from the surface — see Fig. 8 again. In this case, in the Taylor expansion of the exponent 
in Eq. (72), with respect to small ¢, we may keep only two leading terms, turning it into a linear 


equation: 
2 2 2 T Lie 
CW LOT: ie Eee re | Es, (6.73) 
dx” ké,T dx Ay en, 
with the well-known exponential solution, satisfying also the boundary condition ¢— 0 at x > «: 
b= Cesp|- 2} at elg|<<T. (6.74) 
D 


The constant Ap given by the last of Eqs. (73) is called the Debye screening length. It may be 
rather substantial; for example, at Tx = 300K, even for the relatively high doping, na ~ 10'Scm® typical 
for modern silicon (« = 12) integrated circuits, it is close to 4 nm — still much larger than the crystal 
lattice constant a ~ 0.3 nm, so that the above analysis is indeed quantitatively valid. Note also that Ap 
does not depend on the charge’s sign; hence it should be no large surprise that repeating our analysis for 
an n-doped semiconductor, we may find out that Eqs. (73)-(74) are valid for that case as well, with the 
only replacement na —> np. 


If the applied field &is weak, Eq. (74) is valid in the whole sample, and the constant C in it may 
be readily calculated using the boundary condition (70), giving 


p 


1/2 
. | Ke T 
wo =C=A,6= — é. (6.75) 


en, 


This formula allows us to express the condition of validity of the linear approximation leading to Eq. 
(74), e| | << T, in terms of the applied field: 


Chapter 6 Page 21 of 38 


Essential Graduate Physics SM: Statistical Mechanics 


a 
ib) 


1/2 
2o6, te ee [Bs (6.76) 
eA, 
in the above example, émax ~ 60 kV/cm. On the lab scale, such field is not low at all (it is twice higher 
than the threshold of electric breakdown in the air at ambient conditions), but may be sustained by many 
solid-state materials that are much less prone to the breakdown.*8 This is why we should be interested in 
what happens if the applied field is higher than this value. 


The semi-quantitative answer is relatively simple if the field is directed out of the p-doped 
semiconductor (in our nomenclature, & <0 — see Fig. 8b). As the valence band bends up by a few T, the 
local hole concentration p(x), and hence the charge density p(x), grow exponentially — see Eq. (71). 
Hence the effective local length of the nonlinear field’s penetration, A, dx) « p(x), shrinks 
exponentially. A detailed analysis of this effect using Eq. (72) does not make much sense, because as 
soon as Ae(0) decreases to ~a, the macroscopic Poisson equation (68) is no longer valid quantitatively. 
For typical semiconductors, this happens at the field that raises the edge ev — e@(0) of the bent valence 
band at the sample’s surface above the Fermi level yz’. In this case, the valence-band electrons near the 
surface form a degenerate Fermi gas, with an “open” Fermi surface — essentially a metal, which a very 
small (atomic-size) Thomas-Fermi screening length:*° 


1/2 
Ag (0) ~ Ang = coal (6.77) 
e 83 (< r) 

The effects taking place at the opposite polarity of the field, é > 0, are much more interesting — 
and more useful for applications. Indeed, in this case, the band bending down leads to an exponential 
decrease of p(x) as soon as the valence band edge éy — e&x) drops down by just a few T below its 
unperturbed value év. If the applied field is large enough, €> émax (as it is in the situation shown in Fig. 
8c), it forms, on the left of such point xo the so-called depletion layer, of a certain width w. Within this 
layer, not only the electron density n, but the hole density p as well, are negligible, so that the only 
substantial contribution to the charge density p is given by the fully ionized acceptors: p » —en_ = —ena, 
and Eq. (72) becomes very simple: 

2 
a 2 aes const, for x) -W<xX<Xp. (6.78) 
dx” KE 


Let us use this equation to calculate the largest possible width w of the depletion layer, and the 
critical value, &, of the applied field necessary for this. (By definition, at é= &, the left boundary of the 
layer, where &y — e@(x) = &, 1.e. ex) = Ev — Ex =A, Just touches the semiconductor surface: x9 — w = 0, 
i.e. Xo = w. (Figure 8c shows the case when & is slightly larger than &.) For this, Eq. (78) has to be 
solved with the following boundary conditions: 


#0) ==. #(0)- €,  ¢(w)=0, SF (w)=0. (6.79) 


48 Even some amorphous thin-film insulators, such as properly grown silicon and aluminum oxides, can withstand 
fields up to ~10 MV/cm. 
49 As a reminder, the derivation of this formula was the task of Problem 3.14. 
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Note that the first of these conditions is strictly valid only if T << A, i.e. at the assumption we have made 
from the very beginning, while the last two conditions are asymptotically correct only if 2p << w — the 
assumption we should not forget to check after the solution. 


After all the undergraduate experience with projective motion problems, the reader certainly 
knows by heart that the solution of Eq. (78) is a quadratic parabola, so that let me immediately write its 
final form satisfying the boundary conditions (79): 


7 2 2 A 1/2 
Hx) = Sta a3) wie we estates , at &= 2A (6.80) 
KE, 2 eny C&W 


Comparing the result for w with Eq. (73), we see that if our basic condition T << A is fulfilled, then Ap 
<< w, confirming the qualitative validity of the whole solution (80). For the same particular parameters 
as in the example before (na = 10'*cm®, «= 10), and A = 1 eV, Eqs. (80) give w = 40 nm and & = 600 
kV/cm — still a practicable field. (As Fig. 8c shows, to create it, we need a gate voltage only slightly 
larger than A/e, i.e. close to 1 V for typical semiconductors.) 


Figure 8c also shows that if the applied field exceeds this critical value, near the surface of the 
semiconductor the conduction band edge drops below the Fermi level. This is the so-called inversion 
layer, in which electrons with energies below w’ form a highly conductive degenerate Fermi gas. 
However, typical rates of electron tunneling from the bulk through the depletion layer are very low, so 
that after the inversion layer has been created (say, by the gate voltage application), it may be only 
populated from another source — hence the hatched blue points in Fig. 8c. This is exactly the fact used in 
the workhorse device of semiconductor integrated circuits — the field-effect transistor (FET) — see Fig. 
9 50 


insulator 


source 


Fig. 6.9. Two main species of the n-FET: (a) the bulk FET, and (b) the FinFET. While 
on panel (a), the current flow from the source to the drain is parallel to the plane of the 
drawing, on panel (b) it is normal to the plane, with the n-doped source and drain 
contacting the thin “fin” from two sides off this plane. 


In the “bulk” variety of this structure (Fig. 9a), a gate electrode overlaps a gap between two 
similar highly-n-doped regions near the surface, called source and drain, formed by n-doping inside a p- 
doped semiconductor. It is more or less obvious (and will be shown in a moment) that in the absence of 
gate voltage, the electrons cannot pass through the p-doped region, so that virtually no current flows 
between the source and the drain, even if a modest voltage is applied between these electrodes. 
However, if the gate voltage is positive and large enough to induce the electric field &> & at the surface 
of the p-doped semiconductor, it creates the inversion layer as shown in Fig. 8c, and the electron current 


50 This device was invented (by Julius E. Lilienfeld) in 1930, but demonstrated only in the mid-1950s. 
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between the source and drain electrodes may readily flow through this surface channel. (Very 
unfortunately, in this course I would not have time/space for a detailed analysis of transport properties 
of this keystone electron device, and have to refer the reader to special literature.5!) 


Fig. 9a makes it obvious that another major (and virtually unavoidable) structure of 
semiconductor integrated circuits is the famous p-n junction — an interface between p- and n-doped 
regions. Let us analyze its simple model, in which the interface is in the plane x = 0, and the doping 
profiles mp(x) and na(x) are step-like, making an abrupt jump at the interface: 


n, =const, at x <0, 0, at x <0, 
= = 6.81 
MA (x) f at x > 0, nol) (r =const, at x>0. ( ) 


(This model is very reasonable for modern integrated circuits, where the doping in performed by 
implantation, using high-energy ion beams.) 


To start with, let us assume that no voltage is applied between the p- and n-regions, so that the 
system may be in thermodynamic equilibrium. In the equilibrium, the Fermi level yz’ should be flat 
through the structure, and at x —> —co and x — +o, where ¢ —> 0, the level structure has to approach the 
positions shown, respectively, on panels (a) and (b) of Fig. 7. In addition, the distribution of the electric 
potential Ax), shifting the level structure vertically by —e@(x), has to be continuous to avoid unphysical 
infinite electric fields. With that, we inevitably arrive at the band-edge diagram that is (schematically) 
shown in Fig. 10. 


Fig. 6.10. The band-edge diagram of a 
p-n junction in thermodynamic 
equilibrium (7 = const, 4’ = const). The 
notation is the same as in Figs. 7 and 8. 


The diagram shows that the contact of differently doped semiconductors gives rise to a built-in 
electric potential difference Ag, equal to the difference of their values of yz in the absence of the contact 
— see Eqs. (65) and (67): 


eAg = e(+ 0)—eg(-1)= w, —w, =A-TIn=, (6.82) 
Ayn» 


which is usually just slightly smaller than the bandgap.*? (Qualitatively, this is the same contact 
potential difference that was discussed, for the case of metals, in Sec. 3 — see Fig. 5.) The arising 


51 The classical monograph in this field is S. Sze, Physics of Semiconductor Devices, 2"! ed., Wiley 1981. (The 3“ 
edition, circa 2006, co-authored with K. Ng, is more tilted toward technical details.) I can also recommend a 
detailed textbook by R. Pierret, Semiconductor Device Fundamentals, 2™ ed., Addison Wesley, 1996. 
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internal electrostatic field & = —d@/dx induces, in both semiconductors, depletion layers similar to that 
induced by an external field (Fig. 8c). Their widths w, and w, may also be calculated similarly, by 
solving the following boundary problem of electrostatics, mostly similar to that given by Eqs. (78)-(79): 


ae ate for —w, <x <0, (6.83) 


dx’ Ke, |(-np), for 0<x<+w,, 


ow,)=9(-w, )+ag, “2(w,)-4Ew,)=0, g-0)=9+0, 4E-0)-4(+0), 6.84) 


dx dx dx dx 
also exact only in the limit 7<< A, n;<< np, na. Its (easy) solution gives the result similar to Eq. (80): 
2 
for — 
j= ces. en, (w, +x) sae or —w, <x<0, (6.85) 
Agd—en,(w, —x) /2Ke,, for 0<x<-+w,, 
with expressions for w, and w,, giving the following formula for the full depletion layer width: 
1/2 
2KE,A : 
wew,tw, [724] : with n,, = 7a" i.e. eae (6.86) 
CN ny +Ny Nye Ng Mp 


This expression is similar to that given by Eq. (80), so that for typical highly doped 
semiconductors (er ~10'*em®) it gives for w a similar estimate of a few tens nm.°3 Returning to Fig. 9a, 
we see that this scale imposes an essential limit on the reduction of bulk FETs (whose scaling down is at 
the heart of the well-known Moore’s law),>+ explaining why such high doping is necessary. In the early 
2010s, the problems with implementing even higher doping, plus issues with dissipated power 
management, have motivated the transition of advanced silicon integrated circuit technology from the 
bulk FETs to the FinFET (also called “double-gate’”, or “tri-gate”, or “wrap-around-gate”) variety of 
these devices, schematically shown in Fig. 9b, despite their essentially 3D structure and hence a more 
complex fabrication technology. In the FinFETs, the role of p-n junctions is reduced, but these structures 
remain an important feature of semiconductor integrated circuits. 


Now let us have a look at the p-n junction in equilibrium from the point of view of Eq. (52). In 
the simple model we are considering now (in particular, at 7 << A), this equation is applicable separately 
to the electron and hole subsystems, because in this model the gases of these charge carriers are classical 
in all parts of the system, and the generation-recombination processes*> coupling these subsystems have 
relatively small rates — see below. Hence, for the electron subsystem, we may rewrite Eq. (52) as 


52 Frequently, Eq. (82) is also rewritten in the form eAg = T In(npna/n;). In the view of the second of Eqs. (60), 
this equality is formally correct but may be misleading because the intrinsic carrier density nj is an exponential 
function of temperature and is physically irrelevant for this particular problem. 

53 Note that such w is again much larger than 1p — the fact that justifies the first two boundary conditions (84). 

54 Another important limit is quantum-mechanical tunneling through the gate insulator, whose thickness has to be 
scaled down in parallel with lateral dimensions of a FET, including its channel length. 

55 In the semiconductor physics lingo, the “carrier generation” event is the thermal excitation of an electron from 
the valence band to the conduction band, leaving a hole behind, while the reciprocal event of filling such a hole 
by a conduction-band electron is called the “carrier recombination”. 
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J, =MUngé — D,, a : (6.87) 
ox 
where q = —e. Let us discuss how each term of the right-hand of this equality depends on the system’s 
parameters. Because of the n-doping at x > 0, there are many more electrons in this part of the system. 
According to the Boltzmann distribution (58), some number of them, 
n, x exp) = a (6.88) 
‘a 
have energies above the conduction band edge in the p-doped part (see Fig. 11a) and try to diffuse into 
this part through the depletion layer; this diffusion flow of electrons from the n-side to the p-side of the 
structure (in Fig. 11, from the right to the left) is described by the second term on the right-hand side of 
Eq. (87). On the other hand, the intrinsic electric field &=—d@/Ox inside the depletion layer, directed as 


Fig. 11a shows, exerting on the electrons the force “= g&=-eé pushing them in the opposite direction 
(from p to n), is described by the first, “drift” term on the right-hand side of Eq. (87).°° 


SSK eee Se ee Se ion 


i’ = const 


<— > 
ee 


n 


Fig. 6.11. Electrons in the conduction band of a p-n junction at: (a) “= 0, and (b) Y> 0. 
For clarity, other charges (of the holes and all ionized dopant atoms) are not shown. 


The explicit calculation of these two flows>’ shows, unsurprisingly, that in the equilibrium, they 
are exactly equal and opposite, so that 7, = 0, and such analysis does not give us any new information. 
However, the picture of two electron counter-flows, given by Eq. (87), enables us to predict the 
functional dependence of /, on a modest external voltage % with |¥| < Ag¢, applied to the junction. 
Indeed, since the doped semiconductor regions outside the depletion layer are much more conductive 


56 Note that if an external photon with energy iw > A generates an electron-hole pair somewhere inside the 
depletion layer, this electric field immediately drives its electron component to the right, and the hole component 
to the left, thus generating a pulse of electric current through the junction. This is the physical basis of the whole 
vast technological field of photovoltaics, currently strongly driven by the demand for renewable electric power. 
Due to the progress of this technology, the cost of solar power systems has dropped from ~$300 per watt in the 
mid-1950s to the current ~$1 per watt, and its global generation has increased to almost 10'° watt-hours per year — 
though this is still below 2% of the whole generated electric power. 

57 | will not try to reproduce this calculation (which may be found in any of the semiconductor physics books 
mentioned above), because getting all its scaling factors right requires using some model of the recombination 
process, and in this course, there is just no time for their quantitative discussion. However, see Eq. (93) below. 
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than it, virtually all applied voltage (i.e. the difference of values of the electrochemical potential 2’) 
drops across this layer, changing the total band edge shift — see Fig. 11b:58 


eAg > eAg+Au'=eAgt+ qv =e(Ag-YV). (6.89) 


This change results in an exponential change of the number of electrons able to diffuse into the p-side of 
the junction — cf. Eq. (88): 


n(V )~n,(0) exp \ (6.90) 


and hence in a proportional change of the diffusion flow j, of electrons from the n-side to the p-side of 
the system, i.e. of the oppositely directed density of the electron current j. = —ej, — see Fig. 11b. 


On the other hand, the drift counter-flow of electrons is not altered too much by the applied 
voltage: though it does change the electrostatic field & = —V¢ inside the depletion layer, and also the 
depletion layer width,°? these changes are incremental, not exponential. As the result, the net density of 
the current carried by electrons may be approximately expressed as 


YY 
i (v) =) siuion = Jade = J (Ojexry — const. (6.9 1a) 


As was discussed above, at “= 0, the net current has to vanish, so that the constant in Eq. (91a) has to 
equal j.(0), and we may rewrite this equality as 


iY )=J. (of ero | - 7 (6.91b) 


Now repeating this analysis for the current j, of the holes (the exercise highly recommended to 
the reader), we get a similar expression, with the same sign before e¥,® though with a different scaling 
factor, 7;(0) instead of 7.(0). As a result, the total electric current density obeys the famous Shockley law 


Vy 


Ar )= 140) il) 10) ovo) 2 


describing the main p-n junction’s property as an electric diode — a two-terminal device passing the 
current more “readily” in one direction (from the p- to the n-terminal) than in the opposite one.®! 


}-1}, with j(0)= j,(0)+ j,(0), (6.92) 


58 In our model, the positive sign of “= Aw’7/q =—Ay’/e corresponds to the additional electric field, -Vu’/q = 
Vui/e, directed in the positive direction of the x-axis (in Fig. 11, from the left to the right), i.e. to the positive 
terminal of the voltage source connected to the p-doped semiconductor — which is the common convention. 

59 This change, schematically shown in Fig. 11b, may be readily calculated by making the replacement (89) in the 
first of Eqs. (86). 

60 This sign invariance may look strange, due to the opposite (positive) electric charge of the holes. However, this 
difference in the charge sign is compensated by the opposite direction of the hole diffusion — see Fig. 10. (Note 
also that the actual charge carriers in the valence band are still electrons, and the positive charge of holes is just a 
convenient representation of the specific dispersion law in this energy band, with a negative effective mass — see 
Fig. 6, the second line of Eq. (53), and a more detailed discussion of this issue in QM Sec. 2.8.) 
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Besides numerous practical applications in electrical and electronic engineering, such diodes have very 
interesting statistical properties, in particular performing very non-trivial transformations of the spectra 
of deterministic and random signals. Very unfortunately, I would not have time for their discussion and 
have to refer the interested reader to the special literature.® 


Still, before proceeding to our next (and last!) topic, let me give for the reader reference, without 
proof, the expression for the scaling factor j(0) in Eq. (92), which follows from a simple, but broadly 
used model of the recombination process: 


j(0)= en; [2+ 2+) (6.93) 


Ln, 1,ny 


Here /. and /, are the characteristic lengths of diffusion of electrons and holes before their 
recombination, which may be expressed by Eq. (5.113), le =(2Det)"”” and I, = (2Dam)'”, with zt and 1 
being the characteristic times of recombination of the so-called minority carriers — of electrons in the p- 
doped part, and of holes in the n-doped part of the structure. Since the recombination is an inelastic 
process, its times are typically rather long — of the order of 10°’s, i.e. much longer than the typical times 
of elastic scattering of the same carriers, that define their diffusion coefficients — see Eq. (51). 


6.5. Heat transfer and thermoelectric effects 


Now let us return to our analysis of kinetic effects using the Boltzmann-RTA equation, and 
extend it even further, to the effects of a non-zero (albeit small) temperature gradient. Again, since for 
any of the statistics (20), the average occupancy (M(é)) is a function of just one combination of all its 
arguments, € = (¢—)/T, its partial derivatives obey not only Eq. (37), but also the following relation: 


A(N()) __ e-w AN(e)) _ e-w ANCE) 


6.94 
OT / Solent vi Ou co, 
As a result, Eq. (38) is generalized as 
Wueee” [vies aoe vr (6.95) 
0€ T 
giving the following generalization of Eq. (39): 
=r Sey (ya's Avr) (6.96) 
OE T 


Now, calculating current density as in Sec. 3, we get the result that is traditionally represented as 


j- o- a +o5(-VT), (6.97) 


61 Some metal-semiconductor junctions, called Schottky diodes, have similar rectifying properties (and may be 
better fitted for high-power applications than silicon p-n junctions), but their properties are more complex because 
of the rather involved chemistry and physics of interfaces between different materials. 

62 See, e.g., the monograph by R. Stratonovich cited in Sec. 4.2. 
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where the constant 5; called the Seebeck coefficient (or the “thermoelectric power”, or just 
“thermopower’’) is given by the following relation: 


S= at) AF (&me*)'” eo Ne (6.98) 
27h 0 é 


Working out this integral for the most important case of a degenerate Fermi gas, with 7 << ¢, 
we have to be careful because the center of the sharp peak of the last factor under the integral coincides 
with the zero point of the previous factor, (¢—)/T. This uncertainty may be resolved using the 
Sommerfeld expansion formula (3.59). Indeed, for a smooth function f(é) obeying Eq. (3.60), so that f(0) 
= 0, we may use Eq. (3.61) to rewrite Eq. (3.59) as 


[s- a Nt ewe 


1 
6 dé eu 


(6.99) 


In particular, for working out the integral (98), we may take f(s) = (8mé’)'*(e— wW/T. (For this function, 


the condition (0) = 0 is evidently satisfied.) Then f(x) = 0, d’fdé |e, = 3(8mp)'""/T = 3(8mex)'”/T, and 
Eq. (98) yields 

a) 1/2 
geqt 42 a°T 3(8me, ) 


S= 
7 (27h) 3 6 T 


(6.100) 


Comparing the result with Eqs. (3.54) and (32), for the constant Ss we get a simple expression 


independent of 7% 
aT c 
Se for T << é,, (6.101) 
2q € q : 
F 


where cy = C//N is the heat capacity of the gas per unit particle, in this case given by Eq. (3.70). 


In order to understand the physical meaning of the Seebeck coefficient, it is sufficient to consider 
a conductor carrying no current. For this case, Eq. (97) yields 


V(u'/q+ST)=0. (6.102) 


So, at these conditions, a temperature gradient creates a proportional gradient of the electrochemical 
potential zz’, and hence the effective electric field @ defined by Eq. (42). This is the Seebeck effect. 
Figure 12 shows the standard way of its measurement, using an ordinary (electrodynamic) voltmeter that 
measures the difference of y’/e at its terminals, and a pair of junctions (in this context, called the 
thermocouple) of two materials with different coefficients 5° 


63 Named after Thomas Johann Seebeck who experimentally discovered, in 1822, the effect described by the 
second term in Eq. (97) — and hence by Eq. (103). 

64 Again, such independence hints that Eq. (101) has a broader validity than in our simple model of an isotropic 
gas. This is indeed the case: this result turns out to be valid for any form of the Fermi surface, and for any 
dispersion law «(p). Note, however, that all calculations of this section are valid for the simplest RTA model in 
that 7 is an energy-independent parameter; for real metals, a more accurate description of experimental results 
may be obtained by tweaking this model to take this dependence into account — see, e.g., Chapter 13 in the 
monograph by N. Ashcroft and N. D. Mermin, cited in Sec. 3.5. 
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Fig. 6.12. The Seebeck effect in a thermocouple. 


T' 


Integrating Eq. (102) around the loop from point A to point B, and neglecting the temperature 
drop across the voltmeter, we get the following simple expression for the thermally-induced difference 
of the electrochemical potential, usually called either the thermoelectric power or the “thermo e.m.f.”: 


ae B A" A’ B 
7 ate Hs Lynde = fser-de=-5, [vides fur-aes [Ura 
q Vian A A‘ A A" 


=_ Ss) (Ze T’) a (7" T") = (S, S, ) (T" _ T") ; 


(6.103) 


(Note that according to Eq. (103), any attempt to measure such voltage across any two points of a 
uniform conductor would give results depending on the voltmeter wire materials, due to an unintentional 
gradient of temperature in them.) 


Using thermocouples is a very popular, inexpensive method of temperature measurement — 
especially in the few-hundred-°C range where gas- and fluid-based thermometers are not too practicable, 
if a 1°C-scale accuracy is sufficient. The temperature responsivity (S| — $) of a typical popular 
thermocouple, chromel-constantan,® is about 70 uV/°C. To understand why the typical values of S are 
so small, let us discuss the Seebeck effect’s physics. Superficially, it is very simple: particles, heated by 
an external source, diffuse from it toward the colder parts of the conductor, carrying electrical current 
with them if they are electrically charged. However, this naive argument neglects the fact that at j = 0, 
there is no total flow of particles. For a more accurate interpretation, note that inside the integral (98), 
the Seebeck effect is described by the factor (e¢— )/T, which changes its sign at the Fermi surface, 1.e. at 
the same energy where the term [-O(M(¢é))/de], describing the availability of quantum states for transport 
(due to their intermediate occupancy 0 < (M(é)) < 1), reaches its peak. The only reason why that integral 
does not vanish completely, and hence $+ 0, is the growth of the first factor under the integral (which 
describes the density of available quantum states on the energy scale) with ¢, so the hotter particles 
(with ¢ > 4) are more numerous and hence carry more heat than the colder ones. 


The Seebeck effect is not the only result of a temperature gradient; the same diffusion of 
particles also causes the less subtle effect of heat flow from the region of higher T to that with lower 7, 
i.e. the effect of thermal conductivity, well known from our everyday practice. The density of this flow 
(i.e. that of thermal energy) may be calculated similarly to that of the electric current — see Eq. (26), 
with the natural replacement of the electric charge q of each particle with its thermal energy (¢— y): 


65 Both these materials are alloys, i.e. solid solutions: chromel is 10% chromium in 90% nickel, while constantan 
is 45% nickel and 55% copper. 


Chapter 6 Page 30 of 38 


Peltier 
coefficient 


Thermal 
conductivity 


Il vs. $ 


Essential Graduate Physics SM: Statistical Mechanics 


i, = f(e-a)vwd' p. (6.104) 


(Indeed, we may look at this expression is as at the difference between the total energy flow density, j.= 
Jevwd'p, and the product of the average energy needed to add a particle to the system (/) by the particle 
flow density, j, = |vwd'p = j/q.)® Again, at equilibrium (w = wo) the heat flow vanishes, so that w in Eq. 
(104) may be replaced with its perturbation w, which already has been calculated — see Eq. (96). The 
substitution of that expression into Eq. (104), and its transformation exactly similar to the one performed 
above for the electric current j, yields 


in = at - MH) mV), (6.105) 
q 


with the coefficients IT and « given, in our approximation, by the following formulas: 


(6.106) 


(6.107) 


Besides the missing factor T in the denominator, the integral in Eq. (106) is the same as the one 
in Eq. (98), so that the constant IT (called the Peltier coefficient®’), is simply and fundamentally related 


to the Seebeck coefficient: 


The simplicity of this relation (first discovered experimentally in 1854 by W. Thompson, a.k.a. Lord 
Kelvin) is not occasional. This is one of the so-called Onsager reciprocal relations between kinetic 
coefficients (suggested by L. Onsager in 1931), which are model-independent, i.e. valid within very 
general assumptions. Unfortunately, I have no time/space left for a discussion of this interesting topic 
(closely related to the fluctuation-dissipation theorem discussed in Sec. 5.5), and have to refer the 
interested reader to its detailed discussions available in the literature.®° 


On the other hand, the integral in Eq. (107) is different, but may be readily calculated, for the 
most important case of a degenerate Fermi gas, using the Sommerfeld expansion in the form (99), with 
fle) = (8me)'"(e— wT, for which f(x) = 0 and a’ffdé |e, = 2(8mpe)'7/T ~ 2(8mer’)'”/T, so that 


et 4nn r 2(8me;)'” _ me nd 
(20h) 3 6 T 3 m- 


(6.109) 


66 An alternative explanation of the factor (¢— 2 in Eq. (104) is that according to Eqs. (1.37) and (1.56), for a 
uniform system of N particles this factor is just (E — G/N = (TS — PV)/N. The full differential of the numerator is 
TdS + SdT —PdV — VadP, so that in the absence of the mechanical work d?/ = —PdV, and changes of temperature 
and pressure, it is just TaS = dO — see Eq. (1.19). 

67 Named after Jean Charles Athanase Peltier who experimentally discovered, in 1834, the effect expressed by the 
first term in Eq. (105) — and hence by Eq. (112). 

68 See, for example, Sec. 15.7 in R. Pathria and P. Beale, Statistical Mechanics, 3" ed., Elsevier, 2011. Note, 
however, that the range of validity of the Onsager relations is still debated — see, e.g., K.-T. Chen and P. Lee, 
Phys. Rev. B 79, 18 (2009). 
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Comparing the result with Eq. (32), we get the so-called Wiedemann-Franz law 
eo. (6.110) 


This relation between the electric conductivity oand the thermal conductivity « is more general 
than our formal derivation might imply. Indeed, it may be shown that the Wiedemann-Franz law is also 
valid for an arbitrary anisotropy (1.e. an arbitrary Fermi surface shape) and, moreover, well beyond the 
relaxation-time approximation. (For example, it is also valid for the scattering integral (12) with an 
arbitrary angular dependence of rate I, provided that the scattering is elastic.) Experiments show that 
the law is well obeyed by most metals, but only at relatively low temperatures, when the thermal 
conductance due to electrons is well above the one due to lattice vibrations, i.e. phonons — see Sec. 2.6. 
Moreover, for a non-degenerate gas, Eq. (107) should be treated with the utmost care, in the context of 
the definition (105) of this coefficient «. (Let me leave this issue for the reader’s analysis.) 


Now let us discuss the effects described by Eq. (105), starting from the less obvious, first term 
on its right-hand side. It describes the so-called Peltier effect, which may be measured in the loop 
geometry similar to that shown in Fig. 12, but now driven by an external voltage source — see Fig. 13. 


S (II, -11,)/ 


TJ 


IL J | 


$ eae Fig. 6.13. The Peltier effect at T= const. 
(11, -1,)/ 


The voltage drives a certain de current J = 7A (where A is the area of conductor’s cross-section), 
necessarily the same in the whole loop. However, according to Eq. (105), if materials 1 and 2 are 
different, the power = j,A of the associated heat flow is different in two parts of the loop.”° Indeed, if 


the whole system is kept at the same temperature (V7 = 0), the integration of that relation over the cross- 
sections of each part yields 


6° It was named after Gustav Wiedemann and Rudolph Franz who noticed the constancy of ratio «/o for various 
materials, at the same temperature, as early as 1853. The direct proportionality of the ratio to the absolute 
temperature was noticed by Ludwig Lorenz in 1872. Due to his contribution, the Wiedemann-Franz law is 
frequently represented, in the SI temperature units, as x/o = LT, where the constant L = (7°/3)kp/e’, called the 
Lorenz number, is close to 2.45x10*W-Q-K”. Theoretically, Eq. (110) was derived in 1928 by A. Sommerfeld. 

70 Let me emphasize that here we are discussing the heat transferred through a conductor, not the Joule heat 
generated in it by the current. (The latter effect is quadratic, rather than linear, in current, and hence is much 
smaller at J > 0.) 
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Vu : 
=H Ahi = Th oti, = Il, of > (6.111) 
12 


R, =11,,4)201. - 


where, at the second step, Eq. (41) for the electric current density has been used. This equality means 
that to sustain a constant temperature, the following power difference, 


AY =(H,-11,)/, (6.112) 


has to be extracted from one junction of the two materials (in Fig. 13, shown on the top), and inserted 
into the counterpart junction. 

If a constant temperature is not maintained, the former junction is heated (in excess of the bulk, 
Joule heating), while the latter one is cooled, thus implementing a thermoelectric heat pump/refrigerator. 
Such Peltier refrigerators, which require neither moving parts nor fluids, are very convenient for 
modest (by a few tens °C) cooling of relatively small components of various systems — from sensitive 
radiation detectors on mobile platforms (including spacecraft), all the way to cold drinks in vending 
machines. It is straightforward to use the above formulas to show that the practical efficiency of active 
materials used in such thermoelectric refrigerators may be characterized by the following dimensionless 
figure-of-merit, ; 

zr=22_T. (6.113) 
K 

For the best thermoelectric materials found so far, the values of ZT at room temperature are in the range 
from 2 to 3, providing the COPooling, defined by Eq. (1.69), of the order of 0.5 — a few times lower than 
that of traditional, mechanical refrigerators. The search for composite materials (including those with 
nanoparticles) with higher values of ZT is one of very active fields of applied solid-state physics.7! 


Finally, let us discuss the second term of Eq. (105), in the absence of Vw’ (and hence of the 
electric current) giving 


j, =-«VT, (6.114) 


This equality should be familiar to the reader because it describes the very common effect of thermal 
conductivity. Indeed, this linear relation is much more general than the particular expression (107) for « 
for sufficiently small temperature gradients it is valid for virtually any medium — for example, for 
insulators. (The left column in Table 6.1 gives typical values of « for most common and/or 
representative materials.) Due to its universality and importance, Eq. (114) has deserved its own name — 
the Fourier law.’ 


Acting absolutely similarly to the derivation of other continuity equations, such as Eqs. (5.117) 
for the classical probability, and Eq. (49) for the electric charge,’ let us consider the conservation of the 
aggregate variable corresponding to jp — the internal energy E within a time-independent volume /. 
According to the basic Eq. (1.18), in the absence of media’s expansion (dV = 0 and hence d7/ = 0), the 


7! See, e.g., D. Rowe (ed.), Thermoelectrics Handbook: Macro to Nano, CRC Press, 2005. 

72 Tt was suggested (in 1822) by the same universal scientific genius J.-B. J. Fourier who has not only developed 
such a key mathematical tool as the Fourier series but also discovered what is now called the greenhouse effect! 

73 They are all similar to continuity equations for other quantities — e.g., the mass (see CM Sec. 8.3) and the 
quantum-mechanical probability (see QM Secs. 1.4 and 9.6). 
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energy change” has only the thermal component, so its only cause may be the heat flow through its 
boundary surface S: 


dE 
—=-bj,-d’r. 6.115 
at pi ( ) 


In the simplest case of thermally-independent heat capacity Cy, we may integrate Eq. (1.22) over 
temperature to write’> 


E=C,T=|¢,Td’r, (6.116) 
V 


where cy is the volumic specific heat, i.e. the heat capacity per unit volume (see the right column in 
Table 6.1). 


Table 6.1. Approximate values of two major thermal coefficients of some materials at 20°C. 


Material «(W-m''-K") cy (J-K-m®) 
Are? 0.026 1.2x10° 
Teflon ([C2F,],) 0.25 0.6x10° 
Water” 0.60 4.2x10° 
Amorphous silicon dioxide 1.1-1.4 1.5x10° 
Undoped silicon 150 1.6x10° 
Aluminum® 235 2.4x10° 
Copper? 400 3.4x10° 
Diamond 2,200 1.8x10° 


At ambient pressure. 

) Tn fluids (gases and liquids), heat flow may be much enhanced by temperature-gradient-induced 
turbulent circulation — convection, which is highly dependent on the system’s geometry. The given values 
correspond to conditions preventing the convection. 

Tn the context of the Wiedemann-Franz law (valid for metals only!), the values of « for Al and Cu 
correspond to the Lorenz numbers, respectively, 2.22x10° W-Q-K* and 2.39x10* W-Q-K”, in a pretty 
impressive comparison with the universal theoretical value of 2.45x10°W-Q-K” given by Eq. (110). 


Now applying to the right-hand side of Eq. (115) the divergence theorem,’ and taking into 
account that for a time-independent volume the full and partial derivatives over time are equivalent, we 
get 
fle Se+ Vi, Jerr=0, (6.117) 


V 


7 According to Eq. (1.25), in the case of negligible thermal expansion, it does not matter whether we speak about 
the internal energy £ or the enthalpy H. 

75 If the dependence of cy on temperature may be ignored only within a limited temperature interval, Eqs. (116) 
and (118) may be still used within that interval, for temperature deviations from some reference value. 

76 | hope the reader knows it by heart by now, but if not — see, e.g., MA Eq. (12.2). 


Chapter 6 Page 34 of 38 


Heat 
conduction 
eauation 


Essential Graduate Physics SM: Statistical Mechanics 


This equality should hold for any time-independent volume V, which is possible only if the function 
under the integral equals zero at any point. Using Eq. (114), we get the following partial differential 
equation, called the heat conduction equation (or, rather inappropriately, the “heat equation’’): 


(6.118) 


where the spatial arguments of the coefficients cy and « are spelled out to emphasize that this equation is 
valid even for nonuniform media. (Note, however, that Eq. (114) and hence Eq. (118) are valid only if 
the medium is isotropic.) 


In a uniform medium, the thermal conductivity « may be taken out from the external spatial 
differentiation, and the heat conduction equation becomes mathematically similar to the diffusion 
equation (5.116), and also to the drift-diffusion equation (50) in the absence of drift (VU = 0): 

Toye ids. (6.119) 
Ot Cy 
This means, in particular, that the solutions of these equations, discussed earlier in this course (such as 
Eqs. (5.112)-(5.113) for the evolution of the delta-functional initial perturbation) are valid for Eq. (119) 
as well, with the only replacement D + Dr. This is why I will leave a few other examples of the 
solution of this equation for the reader’s exercise. 


Let me finish this chapter (and this course as a whole) by emphasizing again that due to 
time/space restrictions I was able to barely scratch the surface of physical kinetics.” 


6.6. Exercise problems 


6.1. Use the Boltzmann equation in the relaxation-time approximation to derive the Drude 
formula for the complex ac conductivity o(@), and give a physical interpretation of the result’s trend at 
high frequencies. 


6.2. Apply the variable separation method’’ to Eq. (50) to calculate the time evolution of the 
particle density distribution in an unlimited uniform medium, in the absence of external forces, provided 
that at t= 0 the particles are released from their uniform distribution in a plane layer of thickness 2a: 

i: for —a<x<+a, 
n= 


0, otherwise. 


77 A much more detailed coverage of this important part of physics may be found, for example, in the textbook by 
L. Pitaevskii and E. Lifshitz, Physical Kinetics, Butterworth-Heinemann, 1981. A deeper discussion of the 
Boltzmann equation is given, e.g., in the monograph by S. Harris, An Introduction to the Theory of the Boltzmann 
Equation, Dover 2011. For a discussion of applied aspects of kinetics see, e.g., T. Bergman et al., Fundamentals 
of Heat and Mass Transfer, 7" ed., Wiley, 2011. 

78 A detailed introduction to this method (repeatedly used in this series) may be found, for example, in EM Sec. 
2.5; 
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6.3. Solve the previous problem using an appropriate Green’s function for the 1D version of the 
diffusion equation, and discuss the relative convenience of the results. 


6.4. Calculate the electric conductance of a narrow, uniform conducting link between two bulk 
conductors, in the low-voltage and low-temperature limit, neglecting the electron interaction and 
scattering inside the link. 


6.5. Calculate the effective capacitance (per unit area) of a broad plane sheet of a degenerate 2D 
electron gas, separated by distance d from a metallic ground plane. 


6.6. Give a quantitative description of the dopant atom ionization, which would be consistent 
with the conduction and valence band occupation statistics, using the same simple model of an n-doped 
semiconductor as in Sec. 4 (see Fig. 7a), and taking into account that the ground state of the dopant atom 
is typically doubly degenerate, due to two possible spin orientations of the bound electron. Use the 
results to verify Eq. (65), within the displayed limits of its validity. 


6.7. Generalize the solution of the previous problem to the case when ¢, 
the n-doping of a semiconductor by mp donor atoms per unit volume is 0 Arrests 
D 


complemented with its simultaneous p-doping by ma, acceptor atoms per unit YY iecte a ea ee Ree 
volume, whose energy &  — éy of activation, i.e. of accepting an additional A 
electron and hence becoming a negative ion, is much lower than the bandgap “AN Loaccessseseesajeseseese 
A —see the figure on the right. éy- 


6.8. A nearly-ideal classical gas of N particles with mass m, was in thermal equilibrium at 
temperature 7, in a closed container of volume V. At some moment, an orifice of a very small area A is 
open in one of the container’s walls, allowing the particles to escape into the surrounding vacuum.7? In 
the limit of very low density n = N/V, use simple kinetic arguments to calculate the r.m.s. velocity of the 
escaped particles during the time period when the total number of such particles is still much smaller 
than NV. Formulate the limits of validity of your results in terms of V, A, and the mean free path /. 


Hint: Here and below, the term “nearly-ideal” means that / is so large that particle collisions do 
not affect the basic statistical properties of the gas. 


6.9. For the system analyzed in the previous problem, calculate the rate of particle flow through 
the orifice — the so-called effusion rate. Discuss the limits of validity of your result. 


6.10. Use simple kinetic arguments to estimate: 


(1) the diffusion coefficient D, 
(11) the thermal conductivity «, and 
(111) the shear viscosity n, 


of a nearly-ideal classical gas with mean free path /. Compare the result for D with that calculated in 
Sec. 3 from the Boltzmann-RTA equation. 


79 In chemistry-related fields, this process is frequently called effusion. 
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Hint: In fluid dynamics, the shear viscosity (frequently called simply "viscosity") may be defined 
as the coefficient 77 in the following relation:8° 
dd, i OV jr 
dA, ar,” 


where d.¥; is the 7’ " Cartesian component of the tangential force between two parts of a fluid, separated 


by an imaginary interface normal to some direction nj; (with 7 # 7’, and hence n; L nj), exerted over an 
elementary area dA; of this surface, and v(r) is the velocity of the fluid at the interface. 


6.11. Use simple kinetic arguments to relate the mean free path / in a nearly-ideal classical gas, 
with the full cross-section o of mutual scattering of its particles.8! Use the result to evaluate the thermal 
conductivity and the viscosity coefficient estimates made in the previous problem, for the molecular 
nitrogen, with the molecular mass m ~ 4.7x10° kg and the effective (“van der Waals”) diameter der ~ 
4.5x10'°m, at ambient conditions, and compare them with experimental results. 


6.12. Use the Boltzmann-RTA equation to calculate the thermal conductivity of a nearly-ideal 
classical gas, measured in conditions when the applied thermal gradient does not create a net particle 
flow. Compare the result with that following from the simple kinetic arguments (Problem 6.10), and 
discuss their relationship. 


6.13. Use the heat conduction equation (6.119) to calculate the time evolution of temperature in 


the center of a uniform solid sphere of radius R, initially heated to a uniformly distributed temperature 
Tini, and at t = 0 placed into a heat bath that keeps its surface at temperature 7. 


6.14. Suggest a reasonable definition of the entropy production rate (per unit volume), and 
calculate this rate for stationary thermal conduction, assuming that it obeys the Fourier law, in a material 
with negligible thermal expansion. Give a physical interpretation of the result. Does the stationary 
temperature distribution in a sample correspond to the minimum of the total entropy production in it? 


6.15.82 Use the Boltzmann-RTA equation to calculate the shear viscosity of a nearly-ideal gas. 
Spell out the result in the classical limit, and compare it with the estimate made in the solution of 
Problem 10. 


80 See, e.g., CM Eq. (8.56). Please note the difference between the shear viscosity coefficient 77 considered in this 
problem and the drag coefficient 77 whose calculation was the task of Problem 3.2. Despite the similar (traditional) 
notation, and belonging to the same realm (kinematic friction), these coefficients have different definitions and 
even different dimensionalities. 

81 T am sorry for using the same letter for the cross-section as for the electric Ohmic conductivity. (Both notations 
are very traditional.) Let me hope this would not lead to confusion, because the conductivity is not discussed in 
this problem. 

82 This problem does not follow Problem 12 only for historic reasons. 
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Appendix MA 


Selected Mathematical Formulas 


that are used in this series, but not always remembered by students 
(and some instructors :-) 


Last corrections: 2021/08/20 


1. Constants 


— Euclidean circle’s /ength-to-diameter ratio: 


m = 3.141592 653...; ea (1.1) 
— Natural logarithm base: 
n 
e=tim,.{142) = 2.718 281828... ; (1.2a) 
n 
from that value, the logarithm base conversion factors are as follows (€> 0): 
l 
ine _iniox2.303, 82-20 .434. (1.2b) 
log, ¢ Ing Inl0 
— The Euler (or “Euler-Mascheroni’”) constant: 
y alin, (L545 42 In| = 05771566490... e” ~1.781. (1.3) 
n 


2. Combinatorics, sums, and series 
(1) Combinatorics 


— The number of different permutations, i.e. ordered sequences of k elements selected from a set 
of n distinct elements (1 = k), is 
n! 
"P_=n-(n-l).....(n-k+l)= : 2.la 
EMD ne 1k = (2.1a) 


© K. Likharev 
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in particular, the number of different permutations of a// elements of the set (7 = k) is 
*P =k-(k-1)-....2-1=k! (2.1b) 


— The number of different combinations, i.e. unordered sequences of k elements from a set of n = 
k distinct elements, is equal to the binomial coefficient 


n ee n! 
"C_= = t= ‘ 22 
: a *P,  k\(n—k)! ee) 


In an alternative, very popular “ball/box language”, "C; is the number of different ways to put in a box, 
in an arbitrary order, & balls selected from n distinct balls. 


— A generalization of the binomial coefficient notion is the multinomial coefficient, 


J n! : - 

Che esyuky = Ea? with n= Lk (2.3) 
which, in the standard mathematical language, is a number of different permutations in a multiset of / 
distinct element types from an n-element set which contains k; (j = 1, 2,.../) elements of each type. In the 
less formal “ball/box language”, the coefficient (2.3) is the number of different ways to distribute n 
distinct balls between / distinct boxes, each time keeping the number (A;) of balls in the j'" box fixed, but 
ignoring their order inside the box. The binomial coefficient "C;, given by Eq. (2.2), is a particular case 
of the multinomial coefficient (2.3) for / = 2 — counting the explicit box for the first one, and the 
remaining space for the second box, so that if ki =k, then ky =n—k. 


— One more important combinatorial quantity is the number M,, of different ways to place n 
indistinguishable balls into k distinct boxes. It may be readily calculated from Eq. (2.2) as the number of 
different ways to select (k — 1) partitions between the boxes in an imagined linear row of (k — 1 + n) 
“objects” (balls in the boxes and partitions between them): 


M@aHC = (K-14) (2.4) 
: (k —1)!n! 
(11) Sums and series 
— Arithmetic progression: 
rede tat ars yk = MM (2.5a) 


k=l 


in particular, at r= 1 it is reduced to the sum of n first natural numbers: 


1424.40 k= Me). (2.5b) 
k=1 
— Sums of squares and cubes of n first natural numbers: 
P42?4..4n? = yk x ae ; (2.6a) 
k=l 
1° ene =yk rey (2.6b) 
k=1 
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— The Riemann zeta function: 


Siyal+s + +e (2.7a) 


the particular values frequently met in applications are 
3 n° 5 1 
a af 2.612, ¢(2)= ae a ae 1.341, ¢(3)=1.202, ¢(4) =n (5) = 1.037. (2.7b) 


— Finite geometric progression (for real 2 ¥ 1): 


n-l _ gn 
eee fae ee Oy ee i : (2.8a) 
k=0 1-A 
in particular, if A* < 1, the progression has a finite limit at n — © (called the geometric series): 
. n-1 0 1 
lim, ,.. 4° = ia‘ =— (2.8b) 
k=0 k=0 A 
— Binomial sum (also called the “binomial theorem’): 
(isos "Cae, (2.9) 
k=0 
where "C;, are the binomial coefficients given by Eq. (2.2). 
— The Stirling formula: 
1 
lim.,_,.. l n(Inn-1 +n 2an) + ———— t+ ...5 2.10 
nove In(n!) =n (Inn — 1) + In (2am) + ——— (2.10) 


for most applications in physics, the first term! is sufficient. 


— The Taylor (or “Taylor-Maclaurin’”) series: for any infinitely differentiable function /(¢): 


df Lary Lat ey Fk. 
eee We — (687 + = 2 gel (S)o° (2.1 1a) 


note that for many functions this series converges only within a limited, sometimes small range of 


lim~ 


zoo (E+E) = f(6)+ 


deviations E . For a function of several arguments, f(¢1,é5,...,éy), the first terms of the Taylor series are 


“ fo. OF ~ 
lims 49 £(8, + 41082 + $2 = fees. oe enone) ar Lh gs. Fag,eee t (2.11b) 


— The Euler-Maclaurin formula, valid for any infinitely differentiable function f(é): 


: _ 1 7 LA ef df 
270) fEds + 1h )- FO]+= *| pra fo| 


! Actually, this leading term was conjectured by A. de Moivre in1733, before J. Stirling’s proof of the series. 


Selected Mathematical Formulas Page 3 of 16 


Essential Graduate Physics MA: Math Appendix 


1 1 Lom “Los ! A Loy Lo] + ne 


30 41] dé dé? 42 6!| dé dé: 
the coefficients participating in this formula are the so-called Bernoulli numbers:? 
1 1 1 1 1 
B, =p B, =e B, = 0, B, ~ 30’ B, = 0, B, ~ 49’ B, = 0, B, “Tago (2.12b) 


3. Basic trigonometric functions 

— Trigonometric functions of the sum and the difference of two arguments: 3 
cos(a+b)=cosacosh¥ sinasinb, (3.1a) 
sin(a+b)=sinacosb+cosasinb. (3.1b) 


— Sums of two functions of arbitrary arguments: 
a+b  b-a 


cosa +cosb = 2cos ae : (3.2a) 
cosa—cosb = 2sin ==" sin9—*, (3.2b) 
+ +h— 
Sieh s— an (3.2c) 
2 2 
— Trigonometric function products: 
2cosa cosh = cos(a+b)+cos(a—b), (3.3a) 
2sina cosb =sin(a+b)+sin(a—b), (3.3b) 
2sina sinb = cos(a —b) —cos(a +b); (3.3c) 


for the particular case of equal arguments, b = a, these three formulas yield the following expressions 
for the squares of trigonometric functions, and their product: 


cos’ a =+(1+c0s2a), sinacosa = 5sin2a, sin’ a == (1~cos2a). (3.3d) 
— Cubes of trigonometric functions: 
1 : dg 
cos’ a ae egea = c6eaa, sin’ a Pinger Gare (3.4) 
4 4 4 4 


— Trigonometric functions of a complex argument: 


2 Note that definitions of B; (or rather their signs and indices) vary even in the most popular handbooks. 

3 I am confident that the reader is quite capable of deriving the relations (3.1) by representing exponent in the 
elementary relation e““*” = e“e*” as a sum of its real and imaginary parts, then Eqs. (3.3) directly from Eqs. (3.1), 
and then Eqs. (3.2) from Eqs. (3.3) by variable replacement; however, I am still providing these formulas to save 
their time. (Quite a few formulas below are included for of the same reason.) 
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sin(a +ib) = sina coshb +icosa sinhb, 3.5) 
cos (a+ib) = cosa coshb —isina sinhb. 


— Sums of trigonometric functions of n equidistant arguments: 
“| sin sin 1 ; : 
Vy bkE= (7 7 s)sin( 2 é) sin($) . (3.6) 
kai [COS cos 2 2 2 


4. General differentiation 


— Full differential of a product of two functions: 


d(fg) = (df)g + f(dg). (4.1) 


— Full differential of a function of several independent arguments, f(&), &,..., &)): 


df = Soe ; (4.2) 
ka OC, 
— Curvature of the Cartesian plot of a smooth function /(€): 
1 d* f 1d&° 
K=—= | ales, (4.3) 
R [+a iaey] 
5. General integration 
— Integration by parts:* 
g(B) 2 
[fag = sel’ - [eg. (5.1) 
g(A) f(A) 


— Numerical (approximate) integration of 1D functions: the simplest trapezoidal rule, 


f h 3h h N h ies 
jreag~a] s(a+d)+s(a+ 3 ]++1(0-2}]=05 sa-Z mn) h= rae (5.2) 


has a relatively low accuracy (error of the order of (h°/12)d°fld& per step), so that the following 
Simpson formula, 


[Pag = S[fa) + 4f (a+b) +2f(a+2h) +. 4f(—M) + fO)] nae (5.3) 


whose error per step scales as (h°/180)d‘f/dé, is used much more frequently. 


4 This formula immediately follows from Eq. (4.1). 
5 Higher-order formulas (e.g., the Bode rule), and other guidance including ready-for-use codes for computer 
calculations may be found, for example, in the popular reference texts by W. H. Press et al., cited in Sec. 16 
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6. A few 1D integrals® 


(1) Indefinite integrals 


— Integrals with (1 + ee : 
fi+e?)?ae= 1/2 


S(+e)" + Sinle-+(1+ 2") 


> 


dg 2\/2 
ape ete) 


— Miscellaneous indefinite integrals: 
dé , aé-l 
= cos” ——2—___, 
le +2aé-1)” lél(@? +1)” 
plsing = S086) 2£ sin 2é + cos2é —2£* -1 


dé= 
zB E = 
dé _ 2 a (a—b) E . 
rer: 7 (a? pay tan F : py? anf | for a> >b 
lee =tan™ é. 


(11) Semi-definite integrals: 
— Integrals with 1/(e° +1): 


ef +1 

t 1 

| & =In =e 

a>0e™ — 1 l—e 

(iii) Definite integrals 
— Integrals with 1/(1 + &):7 

fet 
w1t+E? 2” 


: Math Appendix 


(6.1) 


(6.2a) 


(6.2b) 


(6.3a) 


(6.3b) 


(6.3c) 


(6.3d) 


(6.4a) 


(6.4b) 


(6.5a) 


below. Besides that, some advanced codes are used as subroutines in the software packages listed in the same 
section. In some cases, the Euler-Maclaurin formula (2.12) also may be useful for numerical integration. 
6 A powerful (and free :-) interactive online tool for working out indefinite 1D integrals is available at 


http://integrals.wolfram.com/index.jsp. 


7 Eq. (6.5a) follows immediately from Eq. (6.3d), and Eq. (6.5b) from Eq. (6.2b) — a couple more examples of the 


(intentional) redundancies in this list. 
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—- =1; (6.5b) 


more generally, 


j dé a (2n-3)! _ w 1-3-5..(2n—3) 
p(l+e2)' 2 (2n-2)! 2.2-4-6..(2n—2)’ 


for n =2,3.... (6.5c) 


— Integrals with (1 — €7”)'”: 
| dja" r{ : ee). (6.6a) 
, (=e \ Qn 2a 2n 
f dy 2 as mi? i 3n+1 
f-¢ y dé = oe fz)/r{ os }: (6.6b) 


where I'(s) is the gamma function, which is most often defined (for Re s > 0) by the following integral: 


} ES Teo dé =T(s). (6.7a) 
0 
The key property of this function is the recurrence relation, which is valid for any s # 0, —1, —2,...: 
T'(s+l =sI(s). (6.7b) 
Since, according to Eq. (6.7a), ['(1) = 1, Eq. (6.7b) for non-negative integers takes the form 
IT(n+l=n!, for n=0,1, 2.... (6.7c) 


(where 0! = 1). Because of this, for integer s =n + 1 = 1, Eq. (6.7a) is reduced to 
[é"e ede =n! (6.74) 
0 

Other frequently met values of the gamma function are those for positive semi-integer values: 


rt Se. ls e typ. re See ae Zee ne (6.7e) 
2 ly hee: cS tae ae os 2-22 


— Integrals with I/(e* £1): 


ioe) s—l 
J = : “ =(1-2' rin g(s), fors >0, (6.8a) 
0 


ice) s—l 
J — =T(s)¢(s), fors>1, (6.8b) 
eo 


where C(s) is the Riemann zeta-function — see Eq. (2.6). Particular cases: for s = 2n, 


a ee _ ens =| an 


B 
0 ef +1 2n 


(6.8c) 


2n? 
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«© ¢2n-l 2n 
6 z i CLS a (6.8d) 
5 e€ —] An 
where B, are the Bernoulli numbers — see Eq. (2.12). For the particular case s = 1 (when Eq. (6.8a) 
yields uncertainty), 
| LL ee (6.8¢) 
0 eo +1 
— Integrals with exp {—é}: 
[eee dé = a1) for s > 1: (6.9a) 
; 2 2 
for applications the most important particular values of s are 0 and 2: 
ro) 1/2 
fede = (5) se (6.9b) 
4 2° \2 2 
(ee a= (3) a 6.9¢ 
J Se deat) |= (6.9¢) 
though we will also run into the cases s = 4 and s = 6: 
2 1/2 % 1/2 
4-67 4 -31(3|-* 66-8 4 -51(7|- 6.9d 
ary bea ce oe eae (6.94) 
for odd integer values s = 2n + 1 (with n = 0, 1, 2,...), Eq. (6.9a) takes a simpler form: 
% 
(Cre a STG (6.9e) 
‘ 2 2 
— Integrals with cosine and sine functions: 
roe 0 1/2 
[cos(é*)dg = fsin(g? )ag = (=) (6.10) 
0 0 
¢ cosé 1 | | 
= e 6.11 
Je +&? $ 2\a| Con 
foe) . 2 
(4 dé = (6.12) 
oS 2 
— Integrals with logarithms: 
1 > \/2 
ine! 6 is jay? | fora>1 (6.13) 
0 a— (1 ee ) 
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1 1/2 
(8) ei, (6.14) 
ene 
0 
— Integral representations of the Bessel functions of integer order: 
J, (a) -— fone) ae so that e752 = VJ, (ae ; (6.15a) 
WT 7 =—0O 
aC ee [e78? cosné dé. (6.15b) 
a 0 


7. 3D vector products 
(1) Definitions: 
— Scalar (“dot-“) product: 
a-b=) a)b,, (7.1) 


where a; and b; are vector components in any orthogonal coordinate system. In particular, the vector 
squared (the same as its norm squared) is the following scalar: 


3 
a’ =a-a=) a? =|al|. (7.2) 
jal 
— Vector (“cross-”) product: 
My My By 
axb =n, (a,b, — a,b,) +m, (a3, —4,b;) +3 (4,b, —a,b,)=|a, a, 43}, (7.3) 
bb, b, 


where {n;} is the set of mutually perpendicular unit vectors’ along the corresponding coordinate system 
axes.° In particular, Eq. (7.3) yields 


axa=0. (7.4) 
(ii) Corollaries (readily verified by Cartesian components): 
— Double vector product (the so-called bac minus cab rule): 
ax(bxc) =b(a-c)—c(a-b). (7.5) 
— Mixed scalar-vector product (the operand rotation rule): 


a-(bxc)=b-(cxa)=e-(axb). (7.6) 


8 Other popular notations for this vector set are {e ;} and {r jhe 


9 It is easy to use Eq. (7.3) to check that the direction of the product vector corresponds to the well-known “right- 
hand rule” and to the even more convenient corkscrew rule: if we rotate a corkscrew's handle from the first 
operand toward the second one, its axis moves in the direction of the product. 
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— Scalar product of vector products: 
(ax b)-(exd)=(a-e)(b-d)—(a-d)(b-c); (7.7a) 
in the particular case of two similar operands (say, a = c and b = d), the last formula is reduced to 


(axb) =(ab)? —(a-b)’. (7.7b) 


8. Differentiation in 3D Cartesian coordinates 


— Definition of the de/ (or “nabla’’) vector-operator V: !° 


25m (8.1) 


j=l j 


where 7; is a set of linear and orthogonal (Cartesian) coordinates along directions nj. In accordance with 
this definition, the operator V acting on a scalar function of coordinates, f(r),!! gives its gradient, i.e. a 
new vector: 


vf= yin, Za grad f. (8.2) 
jal i 


Jj 


— The scalar product of del by a vector function of coordinates (a vector field), 
3 
f(r) =n, f)(r), (8.3) 
j=l 
compiled by formally following Eq. (7.1), is a scalar function — the divergence of the initial function: 


V-f 


3 Of. 
yh =divf, (8.4) 
ja OF; 


while the vector product of V and f, formed in a formal accordance with Eq. (7.3), is a new vector — the 
curl (in European tradition, called rotor and denoted rot) of f: 


n, ny, On, 
Vxf alo a Scary Os Os +n, Ti Ts +n, Fa _ hh =curlf. (8.5) 
Or, OF; Or; Or, Or, OF, 


1 2 3 


— One more frequently met “product” is (f-V)g, where f and g are two arbitrary vector functions 
of r. This product should be also understood in the sense implied by Eq. (7.1), i.e. as a vector whose /” 
Cartesian component is 


[(f-V)g], = ee a (8.5) 


10 One can run into the following notation: V = 0/dr, which is convenient in some cases, but may be misleading in 
quite a few others, so it will be not used in this series. 
11 Tn this, and four next sections, all scalar and vector functions are assumed to be differentiable. 
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9. The Laplace operator V’ =V-:V 


— Expression in Cartesian coordinates — in the formal accordance with Eq. (7.2): 


ye 
vV=>y—. 
ja Or; 


MA: Math Appendix 


(9.1) 


— According to its definition, the Laplace operator acting on a scalar function of coordinates 


gives a new scalar function: 
3 


Vf=V-(Wf)= 


ja O r; 


(9.2) 


— On the other hand, acting on a vector function (8.3), the operator V’ returns another vector: 


3 
2 2 
V fay i 
= 


(9.3) 


Note that Eqs. (9.1)-(9.3) are only valid in Cartesian (i.e. orthogonal and linear) coordinates, but 
generally not in other orthogonal coordinates — see, e.g., Eqs. (10.3), (10.6), (10.9) and (10.12) below. 


10. Operators V and V’ in the most important systems of orthogonal coordinates!2 
p p y' g 


(i) Cylindrical coordinates {p, ~, z} (see Fig. below) may be defined by their relations with the 


Cartesian coordinates: 


PCOS®, 
psing, 
iz 
yh 
— Gradient of a scalar function: 
Vf =n Taps BO ge AD. 


"89 6° plo 7 
— The Laplace operator of a scalar function: 

ode. beg A a’ f 
p a ar "ae dp? 62°” 


Vif=- 


— Divergence of a vector function of coordinates (f = npfp+ Nefot nf): 


1 Aef,) 1%, | a, 
Pp op pop a 


V f= 


12 Some other orthogonal curvilinear coordinate systems are discussed in EM Sec. 2.3. 
13 In the 2D geometry with fixed coordinate z, these coordinates are called polar. 
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— Curl of a vector function: 


vit=n{ te Flog {% ~L.) 1m, 1{ Ae) Fe) (10.5) 
*\ p09 & 6z Op p\ op Op 


— The Laplace operator of a vector function: 


1 2. of, 1 2 
V-f= V2 eae) + a es | ade 10.6 
nf Dp ple pe ye) i ie Po fom dg Zz he ( ) 


(ii) Spherical coordinates {r, 0, yp} (see Fig. below) may be defined as: 


r, =rsinO@cos@g, 
r, =rsindsing, (10.7) 


r; =rcosé. 


— Gradient of a scalar function: 


Vf =n, T +N, Les +n, : a (10.8) 
"or °ré@ * rsin@ 0g 


— The Laplace operator of a scalar function: 


2; 
Vee G 2) z “(sino rs Emence is (10.9) 
r° or or) r’sin@ 00 08) (rsin@) 0g 


— Divergence of a vector function f= n,f + nefot Nofo: 


Las), 1 Afsind), 1 


Veto, (10.10) 
r or rsin@ 00 rsin@ 0g 
— Curl of the similar vector function: 
0 ind 0 
a cc ye EAE CT 
rsin@ 00 Op r\sin0 dp Or °rl a @0 
— The Laplace operator of a vector function: 
é 2 of, 
Vf=n — sin 0) — — 
{v Sr at- r 7; 7 sind 00°” r> sin@ _ 
1 2 of. 2cosd of, 
+n,| V° + : : 10.12 
fo a sin?” r’ 00 r’sin’@ =| ( ) 
+n,| V?¢ = oe 2> 0h: eee Fy 
* r’ sin? 6 r° sin@ oe r’ sin’ 0 0g 
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11. Products involving V 
(1) Useful zeros: 


— For any scalar function f(r), 
V x(Vf) =curl (grad f) =0. (11.1) 


— For any vector function f(r), 
V -(V xf) = div (curl f) =0. (11.2) 


(ii) The Laplace operator expressed via the curl of a curl: 
Vf =V(V-f)-Vx(Vxf). (11.3) 


(iii) Spatial differentiation of a product of a scalar function by a vector function: 


— The scalar 3D generalization of Eq. (4.1) is 


V-(f8)=(Vf)-8+f(V-8). (11.4a) 
— Its vector generalization is similar: 
Vx(fg)=(Vf)xe+ f(V xg). (11.4b) 
(iv) 3D spatial differentiation of products of two vector functions: 
Vx(fxg)=f(V-g)—(f-V)g-(V-flg+(g-V)f, (11.5) 
V(f -g)=(f -V)g+(g-V)f+£x(Vxg)+gx(Vxf), (11.6) 
V -(fxg)=g-(Vxf)-f-(Vxg). (11.7) 


12. Integro-differential relations 


(i) For an arbitrary surface S limited by closed contour C: 
— The Stokes theorem, valid for any differentiable vector field f(r): 


[(Vxt)-d’r = [(Vxf),d?r = $f -dr =$ fdr, (12.1) 


AY 


where d’r = nd’r is the elementary area vector (normal to the surface), and dr is the elementary contour 
length vector (tangential to the contour line). 


(11) For an arbitrary volume V limited by closed surface S: 
— Divergence (or “Gauss”’) theorem, valid for any differentiable vector field f(r): 


[WV ft )a’r=ft-dr=$ far. (12.2) 


— Green’s theorem, valid for two differentiable scalar functions f(r) and g(r): 


[Uf V?g—gV°f)d*r = §(f Vg eV), d°r. (12.3) 


V 


Selected Mathematical Formulas Page 13 of 16 


Essential Graduate Physics MA: Math Appendix 


— An identity valid for any two scalar functions fand g, and a vector field j with V-j = 0 (all 
differentiable): 


[LG-Va)+2G-v]4 "r= bitin (12.3) 


13. The Kronecker delta and Levi-Civita permutation symbols 
— The Kronecker delta symbol (defined for integer indices): 


Bsc 1 abe ys (13.1) 
 "\0, otherwise. 


— The Levi-Civita permutation symbol for three integer indices (each taking one of the values 1, 
2, or 3): 


+1, if the indices follow in any "correct" ("even") order :1 > 2 >3 51-2... 


€ j= —1, — if the indices follow in any "incorrect" ("odd") order :1 + 3 > 2 >1-—3..., (13.2) 


0, if any twoindices coincide. 


— Relation between the products of the Levi-Civita and Kronecker symbols: 


3 jl jl’ jl" 
E pin Eke" = DE dj Oj d jy" ; (13.3a) 
L'1"=1 
dim O jmp d jmp 


the summation of three such relations written for three different values of 7 = k, yields the so-called 
contracted epsilon identity: 


Yee pO a as S04 a (13.3b) 


14. Dirac’s delta function, sign function, and theta function 
— Definition of 1D delta function (for real a < b): 
a ifa<0<)b, 
J neateas- (14.1) 


0, otherwise, 


where f() is any function continuous near &= 0. In particular (if f(¢) = 1 near €= 0), the definition 
yields 


2 1, ifa<0<b, 
[o(dé = (14.2) 
3 0, otherwise. 
— Relation to the theta function 0(é) and the a sgn(é) 
eae 4p areas diate 14.3 
(¢)= dé (¢)= Fas sgn(¢) , (14.3a) 
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where 
sen(é)+1 |0, if € <0, -1, if€ <0, 
p(y = Seed Hh sen(Z)e =) 0 | (14.3b) 
2 bh. abe El) ely ares 
— An important integral:!4 
fei? ds =276(6). (14.4) 
— 3D generalization: the delta function o(r) of the radius-vector is defined as 
0), ifOeV, 
| f@)o(r)a*r = eave (14.5) 
5s 0, otherwise; 
it may be represented as a product of 1D delta functions of Cartesian coordinates: 
O(r) = 0(7, 0% )O(75). (14.6) 


(The 2D generalization is similar.) 


15. The Cauchy theorem and integral 


Let a complex function f(z) be analytic within a part of the complex plane z, which is limited by 
a closed contour C and includes point z’. Then 


f£(2)dz = 0, (15.1) 


vo = 27if(z'). (15.2) 
C Z—@Z 


The first of these relations is usually called the Cauchy integral theorem (or the “Cauchy- 
Goursat theorem”), and the second one, the Cauchy integral (or the “Cauchy integral formula’’). 


16. References 


(1) Properties of some special functions are briefly discussed at the relevant points of the lecture notes 
(in alphabetical order): 


— Airy functions: QM Sec. 2.4; 

— Bessel functions: EM Sec. 2.7; 

— Fresnel integrals: EM Sec. 8.6; 

— Hermite polynomials: QM Sec. 2.9; 


14 The coefficient in this relation may be readily recalled by considering its left-hand side as the Fourier-integral 
representation of the function f(s) = 1, and applying Eq. (14.1) to the reciprocal Fourier transform: 


f(s) =1= = Je SS [a 5(E)\dé. 
TW 


—00 
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— Laguerre polynomials (both simple and associated): QM Sec. 3.7; 

— Legendre polynomials, associated Legendre functions: EM Sec. 2.8 and QM Sec. 3.6; 
— Spherical harmonics: QM Sec. 3.6; 

— Spherical Bessel functions: QM Secs. 3.6 and 3.8. 


(ii) For more formulas, and their discussions, I can recommend the following handbooks (in 
alphabetical order):!5 


— M. Abramowitz and I. Stegun (eds.), Handbook of Mathematical Formulas, Dover, 1965 (and 
numerous later printings);!° 

—I. Gradshteyn and I. Ryzhik, Tables of Integrals, Series, and Products, oi ed., Acad. Press, 1980; 

— G. Korn and T. Kor, Mathematical Handbook for Scientists and Engineers, ia ed., Dover, 
2000; 

— A. Prudnikov et al., Integrals and Series, vols. 1 and 2, CRC Press, 1986. 


The popular textbook, 
—G. Arfken et al., Mathematical Methods for Physicists, hag ed., Acad. Press, 2012, 


may be also used as a formula manual. 


Many formulas are also available from the symbolic calculation parts of commercially available 
software packages listed in Sec. (iv) below. 


(iii) Probably the most popular collection of numerical calculation codes are the twin manuals 


—W. Press et al., Numerical Recipes in Fortran 77, as ed., Cambridge U. Press, 1992; 
—W. Press et al., Numerical Recipes [in C++ — KKL], a ed., Cambridge U. Press, 2007. 


These lecture notes include very brief introductions into numerical methods of differential 
equation solution: 


— ordinary differential equations: CM Sec. 5.7, and 
— partial differential equations: CM Sec. 8.5 and EM Sec. 2.11, 


which include references to the literature for further reading. 


(iv) The most popular software packages for numerical and symbolic calculations, all with plotting 
capabilities (in alphabetical order): 


— Maple (http://www.maplesoft.com/); 
— MathCAD (http://www.ptc.com/products/mathcad/); 


— Mathematica (http://www.wolfram.com/products/mathematica/index.html); 
— MATLAB (http://www.mathworks.com/products/matlab/). 


15 On a personal note, perhaps 90% of all formula needs throughout my research career were satisfied by a tiny, 
wonderfully compiled old book: H. Dwight, Tables of Integrals and Other Mathematical Data, 4" ed., 
Macmillan, 1961, whose used copies, rather amazingly, are still available on the Web. 
16 An updated version of this collection is now available online at http://dlmf.nist.gov/. 
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Appendix CA 


Selected Physical Constants 


according to the 2018 International CODATA recommendation. ! 


Last corrections: 2021/08/20 


S SI value Gaussian value Relative r.m.s. 
ymbol Constant . ; : 
and unit and unit uncertainty 
; speed of light 2.997 924 58x108 2.997 924 58x10'” 0 
in vacuum m/s cm/s (defined value) 
Ne Avogadro 6.022 140 76x10" 6.022 140 76x10° 0 
constant 1/mol 1/mol (defined value) 
a Planck 6.626 070 15x10 ** 6.626 070 15x10 ~’ 0 
ie constant J/Hz erg/Hz (defined value) 
ke Boltzmann 1.380 649 000x10 | 1.380 649 000x10~'° 0 
constant J/K erg/K (defined value) 
: elementary 1.602 176 634x10'? | 4.803 204 713x10 0 
electric charge C statcoulomb (defined value) 
electric 8.854 187 8128x10 |” _ “10 
& ~1.5x10 
constant F/m 
magnetic 1.256 637 062 12x10° _ rf 10 
a constant N/A? Henke 
ss electron’s | 0.910938 370x10 ~~" | 0.910 938 370x10 ~’ 3x10 
: rest mass kg g 
proton’s 1.672 621 923x10 ~’ | 1.672 621 923x10 “ “10 
rest mass kg g 
G gravitation 6.674 30x10 6.674 30x10 * Sage 
constant m/kg-s” cm’/g-s* 
See, e.g., http://physics.nist.gov/cuu/Constants/index.html. CODATA is an_ interdisciplinary 


Committee on Data for Science and Technology of the International Council of Science (ISCU). Its 


recommendations, renewed every four years, are widely accepted by the scientific community. 


© K. Likharev 


Essential Graduate Physics CA: Constant Appendix 


Comments: 


1. The fixed value of c transfers the legal definition of the second (as “the duration of 
9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels 
of the ground state of the cesium-133 atom’’) to that of the meter. These values are back-compatible with 
the legacy definitions of the meter (initially, as the 1/40,000,000" part of the Earth meridian length) and 
the second (for a long time, as the 1/(24x60x60) = 1/86,400" part of the Earth rotation period), within 
the experimental errors of those measures. 


2. The exact value of the Avogadro number, prescribed by the last CODATA adjustment of 
fundamental constants in 2018, fixes 1 kg in the atomic units of mass (u), defined as 1/12 of the a © 
atom’s mass, excluding the legacy etalons of the kilogram from the primary metrology — even though 
their masses are compatible with the new definition within the experimental accuracy. 


3. The exact value of h, also prescribed by CODATA in 2018, together with the fixed value of 
the second, enables the fundamental definition of energy units (in the SI system, the Joule) in terms of 
time/frequency. 


4. The only role of the Boltzmann constant kg is to express the kelvin (K) in energy units. If 
temperature is used in these units (as is done, for example, in the SM part of this series), this constant is 
unnecessary. 


5. & and 4% are also not really the fundamental constants; their role is just to fix electric and 
magnetic units in the SI system. Their product is exactly fixed as /& = 1/c’, and jw virtually coincides 
with the legacy value 42x10’. (Before the 2018 adjustment, that value was considered exact, but the 
exact fixation of e in the new system of constants gives it an experimental uncertainty, if only a very 
small one — see the table above.) 


6. The dimensionless fine structure (“Sommerfeld’s”) constant a is numerically the same in any 
system of units: 


2 . . 
SIS AE . MSPs nas seaxl0" 6, 
e-/hc in Gaussian units 137 
The relative uncertainty of the first value is smaller than 10 '°.2 The accuracy of the second, mnemonic 
value is better than 0.03%. 


7. The listed proton’s rest mass mp is close to 1.007 u, while the neutron’s rest mass is close to 
1.009 u; their differences from 1 u reflect mostly the binding energy of these baryons in the '*C nucleus. 


8. Note the relatively poor accuracy with which we know the Newtonian constant of gravitation 
— due to the extreme weakness of gravity on human scales of mass and distance. 


2 L. Morel et al., Nature 588, 61 (2020). 
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